Abstract:
The aim of this research is to develop a more accurate method to detect unreliable news articles without considering the article content. Fake news articles are defined h...Show MoreMetadata
Abstract:
The aim of this research is to develop a more accurate method to detect unreliable news articles without considering the article content. Fake news articles are defined here as those articles which consist entirely of intentionally fabricated unreliable news. The approach taken in this work to detect fake news articles was to consider the type and frequency of HTML tags used in articles. By comparing the counts for HTML tags used in reliable and unreliable online articles, it was found that there are distinct differences between the HTML tags used in the two types of article sources (unreliable or reliable). Two datasets were used with different labelling of ground truth. The first dataset used, NELA 2017 (News Landscape 2017), comprises 136,000 news articles, obtained from 92 different news sources, dated between April 2017 and October 2017. The sources of the articles in NELA 2017 were categorized as either reliable or unreliable, using a media bias fact-checker resource and this was used to label the articles as either reliable or unreliable. The FakeNewsNet dataset is comprised of over 15000 news articles and tweets obtained from a fact-checking website, Gossip Cop, and has preassigned ground truth labels (fake or real). After analysis of NELA 2017, it was found that unreliable articles have 166 tags that were never used by the reliable articles and that there are 8 HTML tags that are used only in the reliable articles. Based on these findings, classification algorithms were employed on the extracted HTML tags. Experimental results show that the KNN classifier (k-nearest neighbors) and the CART classifier (classification and regression tree) give the best performance, having accuracies of around 97% when 10-fold cross-validation was implemented on the NELA 2017 dataset. Accuracies of around 72% were found when the same techniques were applied to the FakeNewsNet dataset. Using the NELA dataset, a comparison was also carried out between this new approach and two o...
Date of Conference: 06-09 October 2020
Date Added to IEEE Xplore: 20 November 2020
ISBN Information: