Sinhala language-based social media analysis to detect fake news

Wijayarathna, W.M.S.N.P.; Jayalal, S.

Please use this identifier to cite or link to this item: http://repository.kln.ac.lk/handle/123456789/21872

Title:	Sinhala language-based social media analysis to detect fake news
Authors:	Wijayarathna, W.M.S.N.P. Jayalal, S.
Keywords:	Fake News, Hybrid Methodology, Social Media
Issue Date:	2020
Publisher:	Faculty of Science, University of Kelaniya, Sri Lanka
Citation:	Wijayarathna, W.M.S.N.P., Jayalal, S. (2020). Sinhala language-based social media analysis to detect fake news. In : International Conference on Applied and Pure Sciences, 2020. Faculty of Science, University of Kelaniya, Sri Lanka, p.87.
Abstract:	In a rapidly evolving digital age, societies rely heavily upon social media to express opinions and to share the news, publicly. With billions of users, this fast mode of information exchange takes only a few minutes to take polarized opinions, oftentimes malicious or misleading, to go viral. The objective of this research is to propose a detection technique that can be used to identify fake news published in the Sinhala language to evade public unrest. Approaches to detect fake news generally rely upon features intrinsic to either the user/source or features based on the content in the text or any hybrid set of above features. The hybrid methodology which was applied in this study, mainly focused on the verifiability of the news text content against credible sources and the credibility of the source was used to obtain the news content. Ordinary user tweets and credible sources’ tweets (from 08 sources) were extracted from Twitter. The selected data set consisted of about 6000 credible sources’ tweets. Then, ordinary user tweets were labelled as fake (120) and non-fake (250) using the domain knowledge about the news published in the particular month. Both types of tweets were converted into a numerical format. The text encoding was done using FastText, which derives a word as the vector summation of character n-grams and converts words into a 300-dimensional vector. The average of word vectors in a sentence was taken as the overall sentence numeric representation. Then, the vector representation of each user tweet was compared against credible news tweet vectors to check whether semantically similar contents appeared on credible sources within a given period. Out of the list of similarity scores obtained by each ordinary user-tweet, the maximum similarity score was used for further analysis. Moreover, a point scheme was introduced for features of a user-account by considering their contribution to the overall credibility of the user- account (e.g.: for each of the 10 followers → 1 point). The summation of the points was taken as the user-account credibility score. Then, the formula T𝑣𝑎𝑙 (UC) + (1− T𝑣𝑎𝑙) TS [i.e. 𝑇𝑣𝑎𝑙 ∊ (0.5,1]], where UC is the account credibility score, and TS is the text verification score was generated. Here, 𝑇𝑣𝑎𝑙 > 0 decides the relative contributions of content verification and user-account credibility to the overall tweet’s credibility assessment. In the initial implementation, for T𝑣𝑎𝑙 = 0.7, results indicated a maximum accuracy of 70% with credibility detection of tweets, after comparison with human-annotated tags. While source credibility plays an important role in overall content’s credibility, the study demonstrates that the use of the verification-based method is more effective in Sinhala fake news detection.
URI:	http://repository.kln.ac.lk/handle/123456789/21872
Appears in Collections:	ICAPS 2020

Files in This Item:

File	Description	Size	Format
Sinhala language-based social media analysis to detect fake news.pdf		142.09 kB	Adobe PDF	View/Open

Show full item record

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets