A Taxonomy of Fake News Classification Techniques: Survey and Implementation Aspects

In the present era, social media platforms such as Facebook, WhatsApp, Twitter, and Telegram are significant sources of information distribution, and people believe it without knowing their origin and genuineness. Social media has fascinated people worldwide in spreading fake news due to its easy availability, cost-effectiveness, and ease of information sharing. Fake news can be generated to mislead the community for personal or commercial gains. It can also be used for other personal benefits such as defaming eminent personalities, amendment of government policies, etc. Thus, to mitigate the awful consequences of fake news, several research types have been conducted for its detection with high accuracy to prevent its fatal outcome. Motivated by the aforementioned concerns, we present a comprehensive survey of the existing fake news identification techniques in this paper. Then, we select Machine Learning (ML) models such as Long-Short Term Memory (LSTM), Passive Aggressive Algorithm, Random Forest (RF), and Naive Bayes (NB) and train them to detect fake news articles on the self-aggregated dataset. Later, we implemented these models by hyper tuning various parameters such as smoothing, drop out factor, and batch size, which has shown promising results in accuracy and other evaluation metrics such as F1-score, recall, precision, and Area under the ROC Curve (AUC) score. The model is trained on 6335 news articles, with LSTM showing the highest accuracy of 92.34% in predicting fake news and NB were showing the highest recall. Based on these results, we propose a hybrid fake news detection technique using NB and LSTM. At last, challenges and open issues along with future research directions are discussed to facilitate the research in this domain further.


I. INTRODUCTION
Fake news is a manipulated information that resembles news media content in nature but not in management structure or intent [1]. It is continuously exploded via social media, newspapers, online blogs, forums, and magazines, making it hard to identify reliable news sources. The continuous explosion of fake news increases the need for efficient analytical tools capable of providing insight into the reliability of online content [2]. The false nature of news has a significant impact (negative/positive) on frequent social media users. It must be detected as early as possible to avoid a pessimistic influence on the readers. Thus, the algorithms and techniques The associate editor coordinating the review of this manuscript and approving it for publication was Rongbo Zhu . that effectively detect fake news become the focus of intense research. Fake news sources neglect the editorial procedures and standards of the mainstream media to ensure information reliability and trustworthiness. Fake news primarily draws the attention of the people who are more interested in political talks and stock values [1] and may affect their mental health, which leads to stress, anxiety, and depression-like issues. To mitigate the dissemination of fake news, one should focus on the original stories published by the authorized publishers rather than individual articles [1].
There exist few reports that claims the spread of fake news was in Before Christ (BC) also [3]. But, its wide-spreading was initiated with the invention of print media, i.e., the printing press in 1439 [4]. Later, the era of social media (Orkut, Facebook, WhatsApp, Twitter, and Telegram) begins VOLUME 10, 2022 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ in the late 1990s, which has the ability for fast and incredible dissemination of information [5]. It becomes an ideal place for all to create, manipulate, and disseminate fake news. Facebook reported that the malicious actor manipulations accounted for less than one-tenth of 1% of public content posted on the site [6], [7]. In 2008, the false rumours on Steve Jobs' health (suffering from a heart attack) reported as authentic had great fluctuations in the stock exchange of Apple Inc. [8]. For instance, research shows that about 19 million bot accounts tweeted in support of either Trump or Clinton during the 2016 US presidential election [9] which perfectly demonstrates how social media greatly contributes to the creation and dissemination of fake news. Fake news is purposefully designed to deceive consumers by playing with the facts and figures. Emulating the fake news as a genuine need to misrepresent reality with various rhetorical forms [5]. There is a possibility that the real news may be cited by fake news in the wrong context to support it [10]. It is quite difficult to detect fake news due to the above factors. However, the trends in Artificial Intelligence (AI) techniques have witnessed the resurrection owing to advances in computing power and big data processing have shown promising results in tackling the aforementioned fake news identification issues [11], [12]. Instead of fake news identification, AI has applicability in various realms of human life. ML and Deep Learning (DL) techniques (subsets of AI) have been extensively used to detect fake news. Researchers across the globe have utilized ML and DL techniques such as Support Vector Machine (SVM), logistic regression, NB, and decision tree (DT) [13], Convolutional Neural Network (CNN), and Deep Neural Network (DNN) to identify and track the fake news and achieved highly accurate results [14], [15]. Motivated by the facts mentioned above, we present a comprehensive survey of state-of-the-art techniques for fake news classification.

A. COMPARISONS WITH THE EXISTING SURVEYS
The task of fake news detection and mitigation becomes crucial in the digital era, i.e., post the advent of social media to mitigate its adverse impacts. Considerable research has been undertaken for the same by researchers worldwide over time. Manual detection of fake news is challenging, as it seems as good as real news from manual observation. In recent years, various AI approaches have been proposed for fake news detection and have shown promising results. Various surveys have been conducted for ML and DL techniques used in this field. Recent surveys for fake news detection have analyzed various ML and DL techniques for fake news classification incorporating various datasets available, identified challenges and future scopes for the same [16]- [18] and [19].
Authors in [17] surveyed various fake news detection methods. The authors have analyzed fake news from various perspectives. Zhou et al. [16] proposed a survey describing various research conducted for fake news and rumour identification on social media platforms. Stahl et al. [9] surveyed various fake news detection techniques and proposed a novel system for the same. Katsaros et al. [20] surveyed various ML techniques for the task of fake news classification to identify the most suitable ML algorithm for the same. A similar survey was also conducted by Agarwal et al. [26] for fake news classification. Recently in 2020, researchers conducted a detailed survey of various state-of-the-art approaches currently in use for rumour and fake news detection. Still, they lack in presenting the current issues and future challenges [21], [22]. Overall, the survey has reviewed the efficacy of modern AI techniques for fake news detection and identified the societal impacts of fake news dissemination. Though these surveys are much information-oriented when we look at the various points which need to be covered, each article seems to miss to incorporate one or the other component such as overview and background of fake news, detailed and comprehensive review of AI techniques used based on various categories namely supervised learning, unsupervised learning, semi-supervised learning, Reinforcement Learning (RL), and also detailed presentation of issues, challenges and future work in this field. Then, the authors in [23] presented a ML and Natural Language Processing (NLP)based text vector representation to predict the fake news. They assessed the performance by comparing six ML models and evaluated the performance based on F1-score, precision, and recall.
Recently, many insightful surveys have been conducted for fake news detection, which has overcome the shortcomings of the previous surveys for the detailed analysis of the state-of-the-art algorithms used for fake news detection and classification. One of the innovative and comprehensive surveys was conducted by Lahby et al. [24], in which they reviewed the most impactful articles between the span of the last ten years and classified them into eight criteria. Similarly, Kumar et al. [25] have also consolidated and reviewed recent papers for fake news detection and suggested the most used approach for the implementation of the model. But, as we delve deep into above two publications, there are many flaws in them like proper analysis of issues to be addressed in future, the background of fake news and explanation of the AI techniques utilized. Authors have mentioned the best algorithm for fake news detection and classification, but their papers do not have proper experimental evidence. In a nutshell, there is a pressing need for a comprehensive, analytical, and evidential survey that covers all the concepts and key points by overcoming the limitations of previous state-of-theart surveys.
Therefore, in the proposed survey, we have reviewed various AI techniques divided into five significant sections, including ensemble learning for fake news classification and have highlighted the challenges and future scopes for the same. Besides these, the background of fake news, which includes its timeline, flow, impact, and sources, is also presented. We have also conducted an experimental model that uses the most powerful AI algorithms for fake news classification. Table 1 shows the relative comparison of existing surveys for fake news classification. Sometimes dissemination of fake news has severe impacts, directly or indirectly related to the financial crisis and mental health. Its widespread is for various purposes, such as political parties spreading fake news to get an advantage in the elections (making the election procedure unfair). Thus, there was an imperative need to develop solutions to combat the problem of fake news dissemination. We were scintillated by the significant prowess of AI, ML, and DL techniques to identify fake news. The present surveys discuss and analyses various AI techniques such as SVM, NB, CNN, LSTM, DT, LR, and Ensemble learning-based approaches. We were motivated by the findings. We implemented a few approaches and discussed their empirical results. The AI techniques have evolved to give significant results in terms of their efficacy in the field and the research is ongoing to enhance the AI techniques for even better results.

2) CONTRIBUTIONS
The major contributions of the paper are as follows.
• We present a comprehensive survey and discuss the taxonomy on AI techniques employed for fake news classification and highlight their advancements in the same domain. We also discuss various sources of fake news dissemination.
• We implemented passive aggressive, LSTM, NB, and random forest algorithms for the fake news classification. Passive aggressive is an ideal algorithm to read data dynamically when huge data is generated every second. NB works well for a high-dimensional dataset and is extremely fast, having very few tunable parameters. LSTM is used because it is a state-of-the-art technique. Random Forest's efficiency is excellent in large datasets. The performance evaluation section discusses the results and empirical findings of these methods in detail.
• Finally, we present the research challenges and open issues about the state-of-the-art AI techniques designed for the identification/detection of fake news.

C. METHODS AND MATERIALS
A systematic analysis and study are carried out as part of the paper's research method to provide a comprehensive analysis of the field of fake news identification using AI techniques. The main aim is to explore and analyze the state-of-the-art techniques for the tangled task of fake news classification. Also, highlight various challenges, open issues, and future recommendations. Our paper tried to include peer-reviewed, high quality, and highly cited research works taken from the repute conferences and the digital libraries such as Springer, Science Direct (Elsevier), ACM, Taylor Francis, Wiley and IEEExplore. We also focused on keywords like ''fake news classification'', ''Artificial intelligence, machine learning, and deep learning for fake news classification'' while searching the digital libraries. We incorporated a proper methodology to review the existing works in the same field comprehensively. We then implemented a few such techniques for fake news classification and discussed their empirical findings.
D. SURVEY STRUCTURE FIGURE 1 shows the structure of this survey. The remaining of the paper is organized as follows. In Section II, we discuss the various AI techniques employed for fake news classification, which might also aid in automated fake news identification. In Section III, we discuss the empirical results of some of the state-of-the-art approaches for fake news classification. Section IV discusses the various open issues and challenges that can be faced by state-of-the-art approaches in the identification of fake news. Finally, Section V concludes the paper.

II. FAKE NEWS: BACKGROUND, TIMELINE, FLOW, SOURCES AND IMPACT
This section discusses about the background knowledge of various concepts used in the proposed survey such as timeline, flow, source and impact of fake news.

A. BACKGROUND
There has been a resurrection of interest of the researchers in the Fake news identification post Internet penetration era. Moreover, the Google trend analysis suggests a significant surge in Fake News in the 21st century. The term Fake News is associated with the misinformation that spreads over the conventional media platforms, especially over social media and web platforms [19], [28]. Fake news has been extensively defined as ''a news article that is intentionally and verifiably false'' [5], [29] and also the ''information presented as a news story that is factually incorrect and designed to deceive the consumers into believing as true'' [30]. Sharma et al. [31] defined that term in a broader perspective, including the scope of its meaning in current usage as ''A news article or message published and propagated through any media carrying false information regardless the means and motives behind it''. Although, it is worthy to note the research works like [32] that have changed the meaning of fake news as news articles, which are purposefully written to misguide or mislead the people reading or listening to the news. However, it can be justified by incorporating fake alternative resources. Fake news has now been visualized as one of the substantial threats to the nation, democracy, and journalism [33]. Many incidents were recorded in 2016 that spread fake news via repute media and web platforms during the United States of America's presidential elections. Out of 8,711,000 reactions, comments, and shares generated on fake news web articles, around 7,367,000 were election articles posted by major news portals [34]. Moreover, the economy is also susceptible to fake news spread. For instance, the spread of fake news related to Barack Obama's injury in an explosion experienced the downfall of ≈ 130 billion USD in stock value [35]. The dissemination of fake news may also result in stressful conditions and mental health deterioration. Over a while, the spread of fake news has raised questions on the integrity of news articles published on online news portals and social media platforms. Also, it is to be noted that social media platforms play a significant role in disseminating fake news among people worldwide. Table 2 describes the few recent fake news articles disseminated in public.

B. TIMELINE OF FAKE NEWS
Fake news has changed its form over the years. Initially, human communication was the medium for fake news spreading. However, with human communication, the mass spreading of fake news was not possible. Later, in the digital era, the spread of voluminous fake news was fast because the world is connected to the Internet. Nowadays, fake news is critical and the information must be verified before reaching it to the public; otherwise, it can cause undesirable actions. Thus, the probable mediums for disseminating fake news have been changed over time. Fig 2 depicts the evolution timeline of fake news since 1400 BC. In 1439, fake news in print media was initiated for the first time. Then, in 1475, fake news related to the murder of a small child by the Jewish community spread over print media, which resulted in the torturing of 15 people of the Jewish community. From 1500 to 1700, many companies had utilized manipulated news articles to increase their sales. Later, in the 19th century, 1897, absurd fake news defamed Mark Twain disseminated [41]. It was one of the initial incidents where fake information about a particular person was propagated in public. FIGURE 2 shows the timeline diagram of fake news evolution.
United Kingdom (UK) general elections in 1924 was another case where fake news was used to manipulate the public opinion for general elections. Since then, the political parties using fake news to gain benefits in elections. In 1938, fake information related to the alien invasion was broadcasted on a radio channel in America that is believed to have created a lot of havoc and panic among the country [42]. Various other fake news stories also spread in the later years of the 20th century, which involved various subjects. From 2005-2015 there was a significant surge in the number of articles consisting of misinformation published over the Internet on websites [3]. In 2017 the term Fake News was included in the word of the year list by Collin's dictionary [43]. Recently, we have witnessed a great surge in the spread of manipulated articles over social media platforms like Twitter and Facebook [44].

C. FLOW OF FAKE NEWS IDENTIFICATION
Fake news is a surging problem post the advent of the Internet and social media. Its dissemination through social media platforms is exponential. This raises an imperative need for automatic fake news classification models. Researchers across the globe have given their intelligent models for fake news classification with different accuracies. In this paper, we have discussed the diverse AI techniques supporting fake news classification and give insights into it by selecting open datasets or aggregating the articles from online platforms. Once the data is aggregated, it needs some pre-processing for correct prediction. The pre-processing of data includes removing noise, erroneous entries, and outliers to make the data more organized. Researchers have used various pre-processing techniques to classify fake news: stopped word removal, stemming, and lemmatization. Stop word removal removes the punctuation marks like ''?!' from the sentence. Stemming is the process of removing prefixes and suffixes from the word like convert played to play, etc. Such pre-processing techniques helps to enhance the dataset quality, which helps increase the prediction accuracy. The next step in the fake news classification is feature extraction, which removes the unnecessary parameters and unrelated features from the dataset. It also aids in reducing the complexity and enhancing the efficacy of the prediction model. Finally, the ML or DL classifier/model is implemented, which works as an output layer and classifies the article as FAKE or REAL. FIGURE 3 shows the diagrammatic representation for the AI-enabled fake news classification.

D. SOURCES OF FAKE NEWS
The concept of fake news was started in 15 th century. There are various sources like radio, newspapers, television broadcasting, and various social media websites like Twitter, Facebook, and emails from where fake news originates dayby-day at a rapid rate. Initially, communication between people was the most significant source of spreading fake news. Nowadays, social bots play an essential role in spreading fake news. It is believed that these social bots were responsible for online misinformation during the 2016 US Presidential campaign and election [46]. Radio is a mass communication medium to spread rumours using deliberate misinformation and false headlines. Social media is one of the most significant sources of fake news. Creating fake news websites, advertisements, and messages received nowadays has 50% of chances to be fake. Fake news spreading is becoming a source of making money, as the website owners get paid for displaying these contents [47].

E. IMPACT OF FAKE NEWS
Fake news dissemination creates a negative impact on society. It was investigated that around 93% of people in the USA use online articles and applications to get informed [48]. High use of social media plays a vital role in spreading rumours and fake news. The most prominent example of this is the spreading of rumours of Hillary Clinton being in child trafficking by the Republican Party during the 2016 U.S. Presidential election. This caused the people of the U.S. to believe that Hillary Clinton was accused, which resulted in Hillary Clinton's defeat. Another instance happened in 2017 when the news of the Las Vegas massacre was spread on social websites, which stated that there were at least 59 people who got killed and more than 500 got injured and also spread misinformation about the suspect [48], [49]. Also, the people come across the lucky draw websites and other stuff that influences them to favour them. Thus, the distribution of fake news has had a severe and adverse impact on society.

III. FAKE NEWS CLASSIFICATION TECHNIQUES: A SOLUTION TAXONOMY
In this section, we present a solution taxonomy for fake news classification. We analyze the diverse approaches employing variegated ML and DL models for fake news classification. A. SUPERVISED LEARNING Supervised learning generally implements regression and classification based problems, where the model learns from the given labelled training data having each input is mapped to a fixed result. It performs well only if a predefined dataset can be used to train the model. The regression-based problems can be solved by predicting real or continuous values by utilizing the available input features. In contrast, the classification categorically divides each input data based on their class labels [50]. In this section, we discuss the stateof-the-art supervised learning algorithms like SVM, CNN, NB, DT, and LR utilized to classify fake news by researchers worldwide.

1) SUPPORT VECTOR MACHINE (SVM)
It is a supervised learning technique that has been extensively used for binary classification problems [51]. However, its applications have been extended to multi-class classification problems and the researchers developed some approaches to accomplish the same [52], [53]. SVM technique is highly appropriate for tangled tasks like the fake news classification.
Agarwal et al. [26] presented a method for fake news classification. Removal of stop words, eliminating the white  spaces and punctuations, and lemmatization of words was considered part of data preprocessing, which reduces the dimensions of data [54]. They have used the techniques like bag-of-words, n-grams, moreover, Term-Frequency or Inverse Document Frequency (TFIDF) vectorizer for the feature extraction purpose and fine-tune the hyperparameters using grid search algorithm. The model was evaluated on the LIAR dataset [55] and the SVM classifier gave better results compared to NB, LR, and RF classifiers. Their model achieves the F1 score of 61%. Another similar approach was proposed by Ahmed et al. [56] in which similar data preprocessing and feature extraction techniques were employed. The model was tested on an open dataset aggregated by Ott et al. [57] and the SVM model prediction accuracy obtained was 83.0%. Moreover, the model was also evaluated on a self-aggregated dataset, which consisted of news articles from kaggle.com and Reuters.com related to politics. The accuracy achieved with the self-aggregated dataset was 86%. The technique used for feature extraction impacts the model accuracy [58].
Later, Deokate [59] proposed an SVM-based classification algorithm for the identification of fake news that spread on social media platforms, especially Twitter. It performs an efficient text preprocessing for the tweets by converting the slang used in the tweet into their standard forms. Moreover, regular expressions converted the words with redundant letters to the original word. Then the tweet segmentation was done using the n-grams technique. Various features like structural features, user features, and content features about the tweet were incorporated in the feature extraction process. They have VOLUME 10, 2022 also considered the user's profile to classify the tweet as fake or real. Finally, they evaluated their proposed approach on the BuzzFeed dataset and obtained the mean absolute error accuracy as 0.0116% and root mean square error as 0.1075%. Thus, considering features like the sentiment of tweets and user credibility based on the user history helps to enhance the accuracy of the fake news classification model.
Another similar study was undertaken by Ahmed et al. [60] in which various supervised learning algorithms like kNN, DT, SVM, Linear-SVM (LSVM), LR were analyzed for the fake news classification. N-gram model and TF-IDF vectorizer were used for feature extraction, the stop-word removal and stemming were considered a part of preprocessing. The model was tested on an open dataset of Adali and Horne [18] and the LSVM gave the best results. The accuracy obtained was 87%, which was a significant improvement over the accuracy obtained by Adali and Horne, which was 71% for the same dataset. Thus, we may surmise that LSVM is a highly efficacious technique employed for the tangled task of fake news classification. Table 3 shows the comparison of various SVM based approaches for the fake news classification. Prasetijo et al. [61] analyzed the performance of SVM classifier for fake news detection based on text classification. Data pre-processing is critical for textual data analysis. So, they have used data cleaning, removing stop words, and tokenization as part of data pre-processing. Then, they employed a TF-IDF vectorizer for feature selection. Finally, SVM with a linear kernel is employed as a classification algorithm and was evaluated over the self-aggregated dataset. They managed to achieve an accuracy of 82%, which was quite promising. However, enhanced results can be obtained if the SVM model is integrated with even more metaheuristic approaches (SVM with NN, SVM with DT, etc.). It is worth noting that SVM is a highly proficient technique for binary classification. Yazdi et al. [62] proposed a novel approach for fake news classification based on SVM to optimize the state-of-theart approaches. They observed many redundant features in the dataset, which are not useful for fake news classification, leading to increased model computational complexity. Thus, Yazdi et al. proposed a hybrid model, which employed K means clustering algorithm for feature selection and then utilized an SVM classifier for fake news classification. They analyzed their hybrid model's performance on the BuzzFeed-News dataset and achieved an average accuracy of 95.34%. Also, they have tested their model on the LIAR and BS Detector datasets and the average precision achieved was 94.19% and 93.89%, respectively. Also, the results suggested that the proposed model is superior compared to the other state-of-the-art techniques.

2) CONVOLUTIONAL NEURAL NETWORK
CNN is considered to be the most used architecture among the other supervised learning architectures. To train the CNN model, a large amount of input data is needed to utilize its capability [63] fully. CNN is also popularly known as ConvNets. A typical CNN architecture consists of 3 layers, namely, convolutional, pooling, and fully connected layers [64], [65]. The convolutional layer can evaluate the neurons connected to the input layer, which gives input to CNN. Various activation functions such as sigmoid, ReLu, and tanh solves the issues of non-linearity in the output VOLUME 10, 2022 given by the convolutional layer. The main function of the pooling layer is to reduce the dimensions. The purpose of the fully connected layer is to convert the whole image data into a single vector form. CNN has played a vital role in classifying fake news for the past many years. This section reviews various state-of-the-art CNN approaches proposed and implemented by various researchers.
Yang et al. [66] proposed a novel TI-CNN (Text and Image information based CNN) approach, which was the combination of text and image information having respective explicit and latent features for the detection of fake news. The authors have utilized the dataset, which was focused on the news regarding the U.S. Presidential election offered by Kaggle. It contains a bunch of 20,015 news having 8074 real news and 11,941 fake news. They have trained and tested their model on this dataset and managed to achieve tremendous performance, i.e., 0.9220 precision, 0.9277 recall, and 0.9210 F1-score [66]. At last, the authors concluded that their model could easily be trained on other features of news by showing the property of expandability.
To address the strong problem of user geolocation, a novel approach of Graph Convolutional Neural Network (GCNN) was proposed in [67]. The authors were motivated to use GCNN because they wanted to address the fake news classification problem by relating events and event publishers. They utilized various popular datasets like GeoText [68] and UTGeo2011 [69] for geolocation and FakeNewsNet for fake news classification. Results on FakeNewsNet were 94.4% (BuzzFeed) and 89.5% (PolitiFact), while 62.3% and 66.2% on GeoText and UTGeo2011 datasets respectively compared to [70].
Then, Lee et al. [71] implemented a system having CNN based DL architecture named Shallow and Wide CNN [72] and ''Fasttext'' [73], which is a word-embedding model learned by syllable unit for the detection of fake news. They utilized CNN, which helped them extract the local features to form a fixed-length global feature vector called BCNN (Bi-CNN). To improve the performance, LSTM/Bi-LSTM and attentive pooling similarity (APS) were added to their model BCNN. The authors used the self-made dataset of 100k Korean articles as a training dataset and 350 recent articles as a test dataset. The accuracy of classification using APS-BCNN was found highest with a value of 72.6%. Table 4 compares various CNN based approaches for fake news classification.
Another approach using CNN was presented by Kaliyar et al. [74], where they proposed a model named GloVe-enabled FNDNet with an in-depth CNN approach. The model was not dependent on extracting hand-crafted features but designed for learning discriminatory features. The authors were inspired to propose the model by observing the recent progress in the field of fake news detection [75], [76]. The model was trained and tested on the Kaggle News dataset based on the 2016 U.S. presidential election and achieved an accuracy of 98.36%, which was far better than the state-of-the-art approaches.

3) DECISION TREE
There are two types of DT: (1) Boosted trees: incrementally constructs an ensemble tree by training every new instance to accentuate mis-modelled training instances previously. One typical example of this is AdaBoost [77], (2) Bootstrap aggregated or bagged trees: builds several decision trees by constantly re-sampling training data with substitution and then voting the trees for a prediction [78]. One specific type of this is a random forest classifier.
Dyson et al. [79] in their paper explored different machine learning techniques like SVM, Stochastic gradient descent (SGD), bounded decision trees, and random forests. The dataset used was collected from the list of sources mentioned in open sources.co and Signal media. TF-IDF of bi-grams and probabilistic context-free grammar (PCFG) detection was applied to the corpus of approximately 11000 articles. The categorization criterion for both PCFG and TF-IDF was fixed as 0.7. Maximum accuracy of about 67.6% for bounded decision tree was obtained when both TF-IDF and PCFG were used. Whereas the accuracies of around 66.1% and 60.1% were achieved when both TF-IDF PCFG was applied individually. Table 5 shows the comparisons of various DT-based approaches for fake news classification.
Later, Pisarevskaya et al. [80] in his paper discussed fake news detection in the Russian language by using two supervised learning algorithms (SVM and RF classifier) at both lexics and discourse levels. Stylistic features could be a part of speech (POS), duration of words, and the expressions of subjectivity were used at the lexics level. Rhetorical structures like vector space modelling were used at discourse level [82]. Data was collected from online newspapers in the Russian language from June 2015 to June 2017 and created two datasets. The first dataset was based on statistical data of the frequencies of lexical markers and the second one was based on statistical data about types of RST relationships. RF classifier with ten-fold cross-validation was used for classification and given the accuracy of 56% for lexical features and 57% for discourse features. Then, Ahmed et al. [81] compared various machine learning classification techniques such as SGD, SVM, LR, KNN, and DT. The model was evaluated using three different datasets. The first dataset was obtained from [57], the second dataset was taken from [18], and the third dataset is a new one which is collected from Reuters.com and kaggle. DT achieved the maximum accuracy of ≈ 72% for the first dataset when using trigram and the size of features as 10000. For the third dataset, using unigram and features size as 10000 or 50000, maximum accuracy of about 89% was obtained by DT.

4) LOGISTIC REGRESSION
There are three forms of logistic regressions: (1) Binary logistic regression: It is used when the dependent variable is binary, (2) Multinomial logistic regression: It is used when the dependent variable has more than two possible outcomes and (3) Ordinal logistic regression: It is used when the dependent variable is ordered [83]. There are two possible values for  fake news classification, i.e., Real/Fake, so binary logistic regression is well suitable.
Tacchini et al. [84] proposed a logistic regression-based classification model for detecting the Facebook posts as hoaxes/non-hoaxes by identifying the users who liked such posts. The proposed model has trained on the public Facebook posts from July 2016 to December 2016. Their model produced an accuracy of 99% even when the training dataset was made up of less than 1% posts. Table 6 shows the relative comparison of various LR based approaches utilized for the fake news classification.
Later, Vicario et al. [85] compared the performance of several classification algorithms for the early detection of fake news. The dataset has two main categories of Facebook pages, i.e., Italian official news articles (data was collected from [88]) and Italian websites that propagate fake news (data was collected from [88], [89]). The dataset was split into a 60:40 ratio for training and testing purposes. Logistic regression achieved an accuracy of around 77%, which was higher than the traditional classification algorithms. Then, Odgol et al. [86] proposed an LR technique comprised of sentiment neutrality, page rank, and content length to content VOLUME 10, 2022 structure error ratio as the independent features. The model was trained on a Kaggle dataset having 10,000 articles, which comprises 5,000 fake and 5,000 real items. Content length to content structure error ratio had the highest probability for fake news prediction, whereas sentiment neutrality had the lowest probability. The model was able to achieve 80% accuracy based on just three features.
Vedova et al. [87] proposed a hybrid technique for fake news detection comprising of social-based methods (1) LR and (2) Harmonic boolean label crowdsourcing (used when the item has one or more reviews) and content-based methods (used when the item has zero reviews(cold-start)). The model was validated using three datasets: the first dataset was from [84], the second was the Politifact dataset, and the third dataset was the Buzzfeed news dataset. Vedova et al. implemented their approach in the Facebook messenger chatbot. The chatbot classifier was trained on the dataset used in [84] and the chatbot was validated on a fourth independent dataset yielding an accuracy of 81.7%.

5) NAIVE BAYES
There are three types of Naive Bayes classifiers: (1) Gaussian NB: It is used when the features take up continuous value and are assumed to be distributed as Gaussian distribution, (2) Multinomial NB: Used mainly for the document classification problem, i.e. to predict the given document belongs to which category, e.g. politics, technology, sports, etc. Here the frequency of words present in the document is used as a classifier. (3) Bernoulli NB: It is similar to the multinomial naive Bayes with the only difference that features are boolean variables describing inputs. [90]. This algorithm finds applications in recommendations system, spam filtering and sentiment analysis. [91].
Granik et al. [92] presented a simple approach for detecting fake news using NB classifier and tested over the Facebook news posts dataset. Their model was able to achieve an accuracy of ≈ 74%. The dataset has only 2000 articles and hence the author suggested collecting more data and using it to train the model. To improve their model's accuracy, the authors also suggested removing stop words, treating rare words separately, and using words in the group to calculate the probabilities. Finally, the authors concluded that the effects of suggested improvements should be a subject of future research.
To classify the Facebook post as ''FAKE'' or ''REAL'', Jain et al. [93] proposed a method using NB Classification technique. They tested the difference in accuracy by taking the articles of different lengths and proposed web scraping to regularly update the dataset to check the veracity of recently updated Facebook posts. The dataset of 11000 news articles produced by GitHub, labelled as fake or real, was used and contained 6335 rows and four columns. The columns contained the index, title, text, and label. The news article was from different categories like business, health, entertainment, science, and technology. The veracity of articles was checked by the expert journalists and then labelled as ''FAKE'' or ''REAL'' [95]. The authors have used a bag-ofwords concept that ignores structure and only counts words' frequency. Since, in this approach, the word order has not been considered. Hence authors have incorporated another model called n-grams to add the sequence of word features. The Area Under The Curve (AUC) score without n-grams was 0.806 for title and 0.912 for text. The AUC score showed improvement by the concept of n-grams and the score came out to be 0.807 for title and 0.931 for text. This is because the number of Vectors in the second model has increased, providing better judgment capacity and accuracy. Table 7 shows the comparison of various NB based approaches for fake news classification.
Then, Hiramath et al [94] in his paper has used various ML techniques like LR, SVM, NB, RF, and deep neural network (DNN) for fake news detection and compared their results. The dataset was made from public figures taken online and pre-processed it by excluding punctuations, URLs, and images by using the stemming and stop words removal technique. Afterwards, Natural Language Processing (NLP) was carried out on the data to extract the important features to generate a training file. The accuracy of the NB algorithm was 89% which was higher than LR, RF and SVM. The authors concluded that DNN proved to be the best in terms of time taken and accuracy in detecting fake news.

B. UNSUPERVISED LEARNING
The major difference between supervised and unsupervised learning is utilising an unlabeled training data set (in unsupervised learning), which means that the classes are not assigned with the values. No information is given regarding the required system. NN, DBN, and other novel approaches are a few unsupervised learning techniques [96]. These learning algorithms can be used to detect the significant and useful clusters in the unlabeled input data and then classify the systems [97]. This section discusses the current state-of-theart unsupervised learning algorithms for the classification of fake news.

1) DEEP NEURAL NETWORK
NN design has been inspired by the network of neurons of the human brain. Moreover, NN working replicates the way the human brain works [98]. NN evolved over time, and many variants of it have been developed. Post the prowess witnessed of DBN in 2006, DL models and network architectures have been the researchers' area. DNN is a variant of NN, and many variants of DNN have also been evolved. It allows computations to be performed by multiple layers [99], [100]. They are referred to as deep owing to their characteristic of a NN consisting of multiple hidden layers. DNN has found applications in NLP, information retrieval, acoustic modelling, and speech recognition. [101]- [104]. DNN efficiently addresses the accuracy versus computational complexity trade-off and can get trained with fewer data than traditional NN. Overall, we summarize that the DNN enhanced its applicability in real-world applications over the heretic NN architectures [105].  However, Singhania et al. [107] presented a novel DNN which is a 3 level hierarchical attention network (3HAN) for the rapid and apt decision of news articles as fake or real. It has three levels dedicated to the headline, words, and sentences. For an efficient representation of news articles, a news vector was developed by analyzing the news article in a hierarchical bottom-up fashion. The headline of the news article demarcates it from other articles, headlines of the news is also analyzed as a part of the proposed model. 3HAN model gives critical importance to the article's parts due to the three layers of attention present in the model. The model's performance was evaluated on a self-aggregated dataset and improved as 96.24% on the self-aggregated dataset, which was superior to that of other stat-of-art models like SVM GloVe.
Zhang et al. [108] proposed a novel automatic fake news reliable model and named it fake-detector. It builds a deep diffusive network from the features extracted from the textual information and simultaneously learns to infer reliability of news articles, authors, and subjects. The authors performed extensive experiments on real-world fake news datasets to compare their fake-detector model with the other state-ofart models. The results obtained were quite satisfactory. The dataset of 14,055 tweets posted by PolitiFact at its official Twitter account and fact articles written regarding these VOLUME 10, 2022 statements were used in this study. Three thousand six hundred thirty-four creators posted these collected news articles, and each creator has published 3.86 articles on average and contained information on 152 different subjects. The authors performed a feature learning from the textual content information based on their proposed Hybrid Feature Extraction Unit (HFLU). The 10-fold cross-validation was used in the experimental setup. News article, creator, and subject sets were partitioned into two subsets according to 9:1 parts, respectively, where nine folds were used as training 1 fold is used as the testing sets. The accuracy score obtained by the fake-detector model was 14.5%, which is higher than the accuracy obtained by various state-of-art models like Hybrid CNN, RNN, SVM, etc. for bi-class classification. While for multi-class classification, their model showed an accuracy score of more than 40%, which was quite higher than the accuracy obtained by other methods. Table 8 compares the various NN based approaches for the task of fake news classification.
Holistically analyzing the news article is very critical to determine the veracity of the news article aptly. Ruchansky et al. [106] proposed an innovative model based on DNN to address the issue of fake news detection. The model incorporated the behaviour of both users and the articles are taken into consideration to analyze the behaviour of the group of users who disseminate the fake news. First, they have used the Recurrent Neural Network (RNN) on text data to gauge user engagements' temporal pattern on a specific article. In the second part, understands the source characteristic to analyze the user's behaviour. Finally, the results of both the modules were integrated to predict the item as fake or real. The model was tested on the publicly available datasets Twitter and Weibo [111] and the accuracy obtained by the proposed model was 89.2% and 95.3%, respectively. The results on the real-world datasets depict the efficacy of the proposed model for fake news detection.
To identify the fake news on newly emerged events, Wang et al. [109] has proposed an end-to-end framework called Event Adversarial Neural Network (EANN), which can identify event-invariant characteristics and help to detect fake news. It has three main components: a multi-modal feature extractor, a fake news detector, and the event discriminator. The multi-modal feature extractor extracts the textual and visual features from the posts and functions with a fake news detector to identify fake news. An event discriminator's function is to eliminate the specific features of an event and maintain the common features among the events. The performance of the proposed model was evaluated on a dataset of Twitter and Weibo. The Twitter dataset includes 7,898 fake news articles, 6026 real news and 514 images, while the Weibo dataset consists of 4749 fake articles, 4779 authentic news and 9528 images. The Weibo dataset's real news was collected from China's authenticated news sources, such as Xinhua news agency and verified. The accuracy of the proposed EANN on the Twitter dataset was 71.5% and on the Weibo dataset, it was 82.7% and outperformed all the other state-of-art models. Finally, the authors concluded that the model could satisfactorily learn transferable features for unseen events and effectively detect fake news for newly emerged events on which existing approaches showed inadequate performance.
Fernandez et al. [110] proposed a DNN in the best possible way for fake news detection in the political domain by combining linguistic and metadata features. The authors believed that a multi-class classifier combining RNNs or CNNs for embedding analysis and a fully connected layer for combining metadata features is the best possible technique to achieve higher accuracy. The liar open dataset with a training set size of 10,269 and validation set the size of 1,284 was used for classification. The 300-dimensional word2vec embeddings obtained from Google News [112] was used and the layer was frozen for text embeddings. The Stacked-CNN model proposed by authors was designed with 128 filters of (3,3) kernel size and was trained until 100 epochs. A dropout rate of 0.3 was added for regularization in all the models. Finally, the authors concluded that compared to earlier approaches, the proposed model outperforms the other models by almost twice with StackedCNN accuracy of 48.5%. The authors revealed that hybridization of pre-trained embeddings with 2D convolutional layer helps in identifying patterns in textual data and concluded that fine-grained fake news detection remains a challenge to date, which needs to be addressed.

C. SEMI-SUPERVISED LEARNING
The primary disadvantage of supervised learning is that it requires labelled data which is time-consuming and has high processing cost. Moreover, the main disadvantage of unsupervised learning is that it can be used only in limited range of applications like customer segmentation, anomaly detection, recommender systems etc. [113]. Semi-supervised learning is a machine learning technique that tries to solve this problems by using a small quantity of labelled data and a large quantity of unlabelled data [114]. Semi-supervised learning has two goals, one is to predict the label of future test data which is called inductive semi-supervised learning, and other is to predict the label of unlabelled data which is called transductive semi-supervised learning [115]. For working with unlabelled data, semi-supervised learning models follows some assumptions. Firstly, continuity assumption states that objects which are near to each other tend to share same class.Secondly, cluster assumptions states that data is divided into discrete clusters and objects in same cluster belong to same class. Lastly, manifold assumption states that data lies on much lower dimension than input space [116]. This section discusses the current state-of-the-art semi-supervised learning algorithms for the classification of fake news.
Dong et al. [117] proposed a novel deep two-path semisupervised learning model in which one path is for supervised learning and the other is for unsupervised learning. To achieve timely detection and enhance performance of fake news in social media, these two paths are jointly optimized and implemented with shared CNN which shares low level features. The experiments were carried out on Twitter datasets and the results show that proposed model can predict fake news efficiently with few labeled data. Benamira et al. [118] proposed a graph based semi-supervised fake news detection which casts the problem into binary text classification i.e. whether an article is either fake news or not. The experimental results reveals that the proposed model achieves better performance especially when model is trained on limited number of labeled articles compared to traditional classification techniques. Mansouri et al. [119] proposed a method based on semi-supervised learning framework using CNN which targets both labeled and unlabeled data. The features of image data and text are extracted using CNN and then Linear Discrimination Analysis (LDA) is used to predict classes of unclassified data. The precision value of 95.5% is achieved by this model which outperforms the other methods in terms of sensitivity, recall and specificity.

D. REINFORCEMENT LEARNING
Unlike supervised learning and unsupervised learning, RL aims to maximize the reward [120]. The algorithms try to converge to the best available direction or path by making the best possible decisions. These decisions are based on what actions are needed to interact with the surrounding environment [121]. In simple words, RL takes feedback from the system on which it is applied at every step and gives a better output. Various RL algorithms can perform better with provided feedback, i.e., Long Short Term Memory (LSTM), RNN, Markov Decision Process (MDP), and Q-Learning. RL helps a lot in classifying fake news. In this section, we discuss the state-of-the-art RL algorithms for the classification of fake news [122].
Chopra et al. [123] proposed an LSTM-based approach for the detection of fake news. First, the authors used SVM trained on TF-IDF cosine similarity features to confirm whether the title and the article's content or news were related to each other. After that, various neural networks combined with LSTM models were trained to label that the title and content's pairing was agreed, disagreed, or discussion was required. Training and testing were done on the FNC-1 dataset, and accuracy were found to be 85.07%. For the early classification of fake news, Gereme et al. [45] implemented various DL approaches like LSTM, and CNN. The authors used the combination of the Kaggle dataset and the George Mclntire dataset to train their model. For the LSTM network, they fed each input into a network with 100 neurons. Then, the obtained output was passed into a dense dimensional network having the activation function as sigmoid. Then, the model was optimized using adam optimizer with binary cross-entropy as a loss function. The LSTM model was tested on the combined dataset and an accuracy of 90.89% was obtained.
Chaudhry et al. [124] assessed the performance of various techniques like LSTM, Gated Recurrent Unit (GRU), and multi-layer feed-forward networks to tackle the issue of stance detection. They have used the dataset provided by fake news Challenge [125] organization. The best performance was obtained using the Bidirectional conditional encoding of LSTM and GloVe dataset's pre-trained word embedding vectors. LSTM obtained an accuracy of 95.3%. Their experiments revealed that for the problem of stance detection, RNN models with ''memory-enabled'' units like LSTM gives significantly better performance than nonrecurrent models. Table 9 displays the comparison of various LSTM and RNN based approaches utilized for fake news classification.
Sometimes, the intentional fake news spreaders manipulate the news's content to make the information pretend like it is real. To address such problems, Wu et al. [126] proposed a novel approach named TraceMiner, mainly to infer the embeddings of social media users with social network structures and then to utilize the LSTM-RNN model to classify the propagation pathways of a message. For this work, the authors collected a large dataset that contained tweets about specific messages having different categories like business, entertainment, medical, and science and technology [128]. Their TraceMiner model performed better than other state-of-the-art models by obtaining an accuracy of 91.24%.

E. ENSEMBLE LEARNING
It has been a widespread practice in human culture to make choices based on the opinions of several individuals or experts and act as a democratic community. In the past few decades, there has been a surge in the interest of researchers of the ML and computer intelligence fraternity in the multiple classifiers systems better known as ensemble learning approaches [129]. The ensemble systems' attention is well sufficed because they have been able to render highly productive results. Moreover, it is worthy to note that the ensemble systems have applications in various fields and many real-world use cases [130]. Numerous hypothetical and empirical studies have showcased that the ensemble approaches have rendered more accurate results than single model approaches [131].
One such ensemble approach was presented by Roy et al. [132]. The authors developed various DL models to detect fake news and then classified this news into different categories that were pre-defined initially. At first, in their contribution, CNN and Bi-LSTM networks were constructed. By obtaining the representation outputs from these networks, they used these outputs as input into MLP for the final classification. By training and testing their ensemble model on the Liar dataset, the overall accuracy of 44.87% was found, which outperformed the previous state-of-the-art approaches. Again in 2019, Gravanis et al. [133] proposed a novel fake news detection using content-based features and ML algorithms. The authors tested the most popular and more performance gainer algorithms to improve the performance using ensemble learning methods such as Ada Boost and Bagging. A new text corpus named UNBiased dataset was introduced, which was made by integrating various new sources for fake news detection. At last, the authors concluded that the enhanced linguistic feature set executed with ensemble learning-based approaches and SVM performed up to the mark than other approaches.
Al-ash et al. [134] proposed an ensemble learning-based fake news classification approach using the concept of majority voting of multiple classifiers. They have utilized stemming and stop word removal approaches to remove the punctuation symbols (pre-processing). Then they formed a document vector representation that would be an input to the ML model. They proposed an RF ensemble classifier consisting of various decision tree classifiers. They evaluated their proposed approach on a self-aggregated dataset consisting of articles in the Indonesian language, as described in [135]. The ensemble learning-based RF classifier proved the accuracy of 98.7%. They compared their proposed approach with the multinomial NB and SVM approaches and got the results in their favour.
The major source for the spread of fake news in the present digital era is the advent of social media platforms like Twitter, Facebook, Telegram, and WhatsApp. Then, Meel et al. [136] proposed an ensemble learning approach for fake tweet identification. They have incorporated the image as well as textual information associated with the tweet. They employed sentiment analysis to analyze the textual data's explicit features, number of people in the image, and image resolution. Also, they implemented a CNN to recognize the implicit parameters associated with the image. They have used a selfaggregated dataset, as mentioned in [66] to evaluate their proposed model's performance. Rather than a binary output, their proposed system predicted the percentage of the news article's credibility and the achievable accuracy of the model was 96% on the mentioned dataset.
Later, an ensemble voting classifier-based approach was proposed in [137], where the authors developed an intelligent system that classifies the news as fake or real. They have compared eleven Novel ML algorithms like NB, KNN, SVM, RF, ANN, LR, Ada Boosting, and some others to detect fake news (incorporated best three based on results). For evaluating the voting classifier, the authors collected a dataset having ≈ 6500 news articles (3252 fake and 3259 real news). The experimental outcomes of their proposed system were better in terms of accuracy (94.5%). Table 10 shows the comparison of various ensemble learning-based approaches for fake news classification.

IV. PERFORMANCE EVALUATION AND COMPARISONS A. DATASET INFORMATION
To evaluate existing state-of-the-art techniques for fake news classification, we have used a dataset comprised of real and fake articles [139]. It contains short statements from various contexts, such as radio or TV interviews, press releases, campaign speeches, etc., with 7796 recorded articles. Each statement is annotated with its veracity label, title, and context. The dataset is divided into training and testing sets and 80% of the dataset is used to train the model and the remaining 20% of the dataset is for testing purposes. The feature extraction techniques are also applied to the dataset using python NLP packages such as the TF-IDF vectorizer and count vectorizer. The stop words are also removed from the data in the pre-processing step to get better accuracy. FIGURE 5 and FIGURE 6 shows the word cloud of fake news and real news respectively after removal of stop words.

B. PREPROCESSING TASK 1) PREPROCESSING FOR NAIVE BAYES, PASSIVE AGGRESSIVE, AND RANDOM FOREST
We have used the TF-IDF vectorizer and count vectorizer as a pre-processing step in the evaluation. The models can only process the numerical data, the pre-processing techniques convert the textual data into vectors. The count vectorizer method only counts the frequencies of words in the document resulting in biasing towards the most common terms. It ignores the rare words which would have allowed our model to be trained more efficiently. Hence, the TF-IDF vectorizer is used to overcome this problem, proportional to the inverse frequency of a word in the corpus. It penalizes the most frequent words and weights the word count by calculating how often it appears in the corpus. It then maps each word with a number, revealing how relevant is that word in the document. The TF-IDF transform method is used to re-weight the count feature vectors obtained from the count-vectorizer. The input is fed into the classifier for better prediction and classification results.

2) PREPROCESSING FOR LSTM
To remove stop words from data, we have used the nltk library. The data was cleaned by removing the URLs, newline, white space, periods, and the data was converted to lower case, not to differentiate small and capital letters. A method text_to_sequences() from the tokenizer class of Keras was used to tokenize the data. Then sequences were truncated and padded by using pad_sequence() method with maxlen parameter set to 1000 to make data uniform for training LSTM.

C. APPROACHES UTILIZED FOR IMPLEMENTATION
This section reviews the implemented approaches such as NB, RF, SVM (passive-aggressive algorithm), and LSTM for fake news identification. The code has been compiled with Python 3.6 language using TensorFlow, Numpy, Pandas, and Sci-kit Learn machine learning backend libraries.

1) PASSIVE AGGRESSIVE
This algorithm learns from massive data streams and does not require a learning rate. The name is aggressive because it aggressively updates the weight vector of misclassified data based on regularization parameter at every epochs or iteration. The Passive Aggressive algorithm is implemented with regularization parameter as 0.5 and max iterations as 50 and hinge loss function. The optimal value of hyperparameters was obtained using grid search. The TF-IDF vector's output is fed as an input to the passive-aggressive classifier.The accuracy managed to achieve by this model was 92.42%.

2) LSTM
It is a kind of RNN architecture that avoids vanishing gradient through backpropagation at each step in the sequence. It is composed of gates with memory content, which regulates how much input is added. In our implementation, the maximum sequence length is set to 1000, which is larger than the length of the largest sequence in the training dataset and the sequence is post-padded by zeros if is shorter. The input sequence is embedded into 100-dimensional vectors and then fed into the LSTM layer with 60 hidden layers. Then the global MaxPooling layer is used, reducing the network's size and overfitting. The output of LSTM is then fed to a one-dimensional dense network with 50 neurons with Relu as an activation function. Because of small training data, on increasing the dropout to more than 0.1, the accuracy was decreasing so dropout of 0.1 was used to prevent overfitting, which means 10% of random nodes are dropped out during training to regularize the deep neural networks.The model is then compiled with adam optimizer and loss function as binary_crossentropy. The model is then trained with a batch size of 128 for nine epochs, as more epochs were resulting in overfitting.

3) NAIVE BAYES
We have used a multinomial NB classifier library of sci-kit learn in our implementation. The smoothing parameter alpha is set to 1 by default and also fit_prior parameter is true, which means the model learns based on prior class probabilities. Multinomial NB is a simplified variant of NB, especially for textual data. The TF-IDF vector's output is fed into the multinomial classifier that estimates the conditional probability of a word based on its frequencies given a class. It works well for a high-dimensional dataset and is extremely fast, having very few tunable parameters.

4) RANDOM FOREST
At last, we implemented an RF classifier for fake news detection using a predefined RF Classifier library of sci-kit learn. It is a classifier that suits multiple decision trees on different data samples and uses averaging to enhance accuracy and 30384 VOLUME 10, 2022 control over-fitting. Around 200 decision trees were used in the forest as estimators whose value was decided based on grid search and the number of jobs was set to 3 to run in parallel. The 'Gini' criteria was used as the default one as the other one 'Information Gain' involves computing a logarithmic function which makes it a bit slower. The 'Gini' criteria is used to measure splitting quality. It is measured by subtracting each class's sum of squared probabilities from 1 and is suitable for larger partitions.

D. EVALUATION METRICS
To evaluate each proposed model, we have used multiple evaluation metrics. In this subsection, we review the most widely used parameters considered for fake news detection.

1) CONFUSION MATRIX
Accuracy, Sensitivity, Specificity are commonly used metrics to predict the model's efficacy. However, it is inappropriate to gauge the efficacy of the model using these matrices when the dataset is imbalanced in terms of class distribution and the model can render high accuracy in such cases by being biased towards the majority class [45]. Hence, the confusion matrix is useful in such an imbalanced domain and gives better insights into the model. Let us take an example of a model which predicts whether a given news article is fake or not.
• True Positive (TP): The number of positive instances that the model correctly predicted as fake and were actually fake ones.
• True Negative (TN): The number of negative instances that the model correctly predicted as true and were actually real articles.
• False Positive (FP): The number of negative instances that the model incorrectly predicted as real but was actually false.
• False Negative (FN): The number of positive instances that the model incorrectly predicted as fake but were actually real ones.

2) PRECISION
In our example, precision measures the fraction of correctly detected fake news by the model over the total number of fake articles.

3) RECALL
The recall is used to measure the sensitivity, which is the fraction of actual fake news over the total number of instances predicted as fake ones.

4) F1-SCORE
F1 score is a weighted average of recall and precision, which takes both false positives and negatives into consideration and can give insight into the overall prophecy for fake news detection.

5) AREA UNDER THE CURVE AND RECEIVER OPERATING CHARACTERISTICS CURVE
It is one of the crucial metrics that evaluate the classification model's performance at various threshold settings. AUC represents the degree of separability between classes, while ROC represents the probability curve. The higher the AUC, the better the model predicts articles (as fake or real). ROC curve is plotted with True Positive Rate (TPR) at Y-axis and False Positive Rate (FPR) at X-axis. When the curve of positive and negative classes does not overlap, the model has an ideal measure of separability and AUC = 1. The AUC score of 0.6 states that the model has a 60% chance of discriminating between positive and negative classes.

E. IMPLEMENTATION RESULTS
This section reviews the results obtained from each of the proposed methods by performing extensive experiments on the dataset. FIGURE 7 represents our flow of implementation for the fake news classification employing various AI techniques like Passive Aggressive, LSTM, NB, and RF.

1) EVALUATION OF NAIVE BAYES APPROACH
We preprocess the NB classifier independently using both Count vectorizer and TF-IDF methods. The classifier achieved an accuracy of 89.03% with a count vectorizer and 85.39% with a TF-IDF vectorizer. It is generally considered that TF-IDF is a better pre-processing technique than count vectorizer, but in our case, the count vectorizer produced high accuracy. It is also noteworthy to mention that although NB is a simple technique, it has good accuracy in fake news classification. Evaluation metrics of NB are confusion matrix (FIGURE 8 and FIGURE 9) and AUC-ROC curve (FIGURE 10), which are shown below.

2) EVALUATION OF LSTM APPROACH
Our LSTM model achieved the highest accuracy among all the models. The accuracy obtained by LSTM was 92.34%. The model was trained using only nine epochs as more epochs resulted in over-fitting. The embedding matrix was generated randomly instead of predefined embedding matrices to know how LSTM performs. Also, it is to mention that dropout was set to 10% in the model to prevent the over-fitting issue. The evaluation metrics of LSTM like confusion matrix (FIGURE 11) and AUC-ROC curve (FIGURE 12), which are shown below.

3) EVALUATION OF RANDOM FOREST APPROACH
RF is a collection of many DTs that works as an ensemble learning. RF was also implemented using two pre-processing techniques, such as (1) count vectorizer and (2) TF-IDF vectorizer. In our random forest model, 200 trees were used VOLUME 10, 2022  and 'Gini' was applied to measure split quality. The number of jobs to be parallelized was set to 3. In RF also, the model pre-processed by count vectorizer (accuracy: 90.37%) performed slightly better than the model pre-processed by the TF-IDF vectorizer (accuracy: 90.21%). The evaluation metrics for RF like confusion matrix (FIGURE 13 and  FIGURE 14) and AUC-ROC curve (FIGURE 15), which are shown below.

4) EVALUATION OF PASSIVE AGGRESSIVE CLASSIFIER APPROACH
The passive-aggressive classifiers are generally used for large scale learning. It works on the principle that if the classification is correct, keep the model, otherwise update to   adjust its misclassification [140]. It was implemented using both the pre-processing techniques (1) count vectorizer and D. Rohera et al.: Taxonomy of Fake News Classification Techniques: Survey and Implementation Aspects   (2) TF-IDF vectorizer. The accuracy achieved using TF-IDF and count vectorizer was 92.26% and 90.21%,  respectively. The evaluation metrics for passive-aggressive classifier are confusion matrix (FIGURE 16 and FIGURE 17) and AUC-ROC curve (FIGURE 18).

F. COMPARATIVE ANALYSIS OF IMPLEMENTED APPROACHES
We discussed several techniques for fake news classification, but we evaluated the performance of four state-of-the-art classification techniques for brevity, such as NB, LSTM, passive-aggressive, and RF. The dataset we used comprised 6335 news articles. We computed the efficiency of classification techniques based on evaluation metrics like accuracy, precision, recall, F1-score and Area under the curve (AUC) (shown in Table 11). VOLUME 10, 2022  The highest accuracy was obtained by LSTM (92.34%), closely followed by the passive-aggressive classifier (92.26% using TF-IDF). Also, it is noteworthy to mention that relatively simple classification techniques like NB and RF also performed well with the accuracies of 89.03% (using count vectorizer) and 90.37% (using count vectorizer), respectively. The highest precision was obtained by LSTM (0.9539), which shows that it is good at detecting fake news. Also, NB, RF, and passive-aggressive classifiers have the precision of 0.8761 (using count vectorizer), 0.9053 (using TF-IDF), and 0.9168(using TF-IDF), respectively. These results are similar to the results obtained for accuracy metrics, revealing that LSTM is the best classifier for fake news classification in terms of accuracy and precision. NB obtained the highest recall (0.9786 using TF-IDF). The recall of RF, passiveaggressive, and LSTM was 0.9114 (using count vectorizer), 0.9284 (using TF-IDF), and 0.8937. This shows that even though NB using TF-IDF vectorizer has the lowest accuracy (85.40%), it is best to use in life-critical situations like cancer detection. On the other hand, LSTM achieved the highest accuracy, and precision has the lowest recall, which means we cannot use it in life situations like detecting diseases.
The highest F1-score was obtained by LSTM (0.9228) and closely followed by PAC (0.9226 using TF-IDF). NB and RF had F1-scores of (0.8963 using count vectorizer) and (0.9073 using count vectorizer), respectively. Thus, it shows that LSTM is the most balanced between precision and recall and can work on data with unusual class distributions. The maximum area under the curve (AUC) was of LSTM (0.924), closely followed by PAC (0.923 using TF-IDF) followed by RF (0.899 using count vectorizer), and NB (0.889 using count vectorizer). This shows that the LSTM has a higher probability of ranking a random positive example more highly than a random negative example [141].
The hybrid proposed model of LSTM and NB can be used in real applications by feeding news of social media to model and then model can predict and label the news as real or fake and reveal it to user. Hence such software can be designed which takes news as input parameter and our proposed model outputs the result of news to user with high accuracy.

V. CHALLENGES AND FUTURE RESEARCH DIRECTIONS
The widespread fake news via online sites and social media platforms negatively impacted society. Many researchers have been working on automating early detection/identification of fake news by employing AI techniques. In the subsequent subsection, we discuss a few of the open issues and challenges faced during fake news classification using AI techniques. Also, we discuss the potential future research work that can be undertaken in the field to improve the efficacy of present state-of-art techniques.

A. CHALLENGES AND OPEN ISSUES
This section highlights various open issues and challenges associated with fake news classification employing AI techniques.
• Dataset Quality: The research was undertaken by Shu et al. [5] showed that there is no benchmark dataset available which incorporates materials to extract all the relevant features for the task of fake news classification. Thus, the field lacks a qualitative and quantitative dataset, which can be highly productive to understand the temporal patterns in fake news dissemination and develop a highly robust ML or DL model.
• Early Detection: The spreading of fake news spread is fast owing to the ease in Internet accessibility and social media platforms. Thus, to mitigate the dissemination and impacts of fake news, it is crucial to detect fake news as early as possible [85]. There have been many methods proposed for early detection of fake news [142], but to develop an efficacious and robust approach for the same remains an ongoing research problem.
• Subtle Semantic Elements: Intermittently, it may occur that the news article title or headline may consist  [124].
• Feature Oriented: There are numerous features associated with fake news articles like the image or video embedded with it. Also, various approaches have tried to incorporate the credibility of the source from where fake news originated or had played a significant role in the dissemination of the fake news [143]. But, the attempts have not been successful in completely understanding the underlying characteristic of fake news. There are highly advanced video and photo editing software available in recent times, which can render high-quality, manipulated visuals. Thus, it becomes more difficult to classify the video as fake or real. Incorporating all relevant features and interpreting all the visual features, developing such a model that can do both is a challenging task [144]. Also, a developing model that can integrate the textual and image analysis associated with the text is a future research work for researchers in the field. The above are some of the open issues and challenges in fake news classification employing AI techniques. A more comprehensive research needs to be undertaken to combat the existing challenges for the same.

B. FUTURE RESEARCH DIRECTIONS
In this section, we highlight various future research directions that will help in a deeper understanding of fake news and improve the performance of existing approaches based on the fake news characteristics and the existing state of fake news research.

1) FAKE NEWS EARLY DETECTION
It is primarily important to detect fake news early before it becomes wide-spread and creates a negative impact on society. If early detection is not conducted, then people start believing it [145]. To detect fake news at an early stage, proper information regarding the current trends, news content and should remain less connected to social media. This causes various challenges like: • New events create new and unexpected knowledge that is not available in the stored existing literature and knowledge databases.
• Secondly, the features that are useful in detecting fake news might not be helpful in the future due to changes in writing styles.
• At last, the less detailed information regarding the news decreases AI techniques' efficiency in the classification of fake news. The following solutions can solve the above challenges in future.
• Ground Truth timeliness: It can be understood that the technologies relating to detecting fake news should have a proper database of all the current and trending news existed for proper and early detection.
• Proper compatibility of features: Features capable of capturing the general structure of deceptive writing style of various subjects and languages and be compatible with the evolution in the writing styles. Here AI techniques like RNN [109] and GANs [146] plays an important role.
• Efficiency in verification: Proper identification of contents and topics present in the news will boost up the performance of the existing system.

2) IDENTIFICATION OF PRIORITIZE AND IMPORTANT CONTENTS
Identifying every news is necessary when some non-existing news comes and circulated into the society like wildfire. It can improve efficiency and performance in detecting fake news. Determining whether a given content is important and check-worthy is based on the following factors: • How potentially it influences society, for example, information related to national affairs.
• Its historical likelihood of being a fake news. Thus, the most important content and creates a large impact on society can be considered check-worthy and needed to be prioritized. VOLUME 10, 2022

3) EMPHASIS ON CROSS-DOMAIN FAKE NEWS
Current existing fake news studies mainly depends on differentiating fake news from real ones by conducting experiments. Also, analyzing fake news across domains, topics, and languages helps gain a deeper understanding of fake news and identify its varied characteristics, which can be utilized in the future for performance improvement in the early detection of fake news.

4) DEEP LEARNING FOR FAKE NEWS
Research and development in DL can potentially help in determining fake news. There are various approaches to DL, which are further and further developing. One such example of DL in fake news classification is the adoption of RNN and GAN to represent sequential posts and user engagements [106], [147]. Recent approaches utilize CNN to catch features from the text and images [142]. For many years, DL has shown great strength in image, text, and speech processing [102]. One of DL's biggest advantages is that it eliminates feature engineering, which is considered the most time-consuming of ML approaches. Also, another important benefit of using DL is that it can adapt to a new problem easily. Thus, it is far important to emphasise DL development, which is more beneficial in fake news classification.

5) FAKE NEWS INTERVENTION
Various fake news articles have stated that there are different business models for interventing fake news involvement developed by multiple websites and social media sites using AI approaches. These sites are now focusing on shifting the emphasis from increasing users' number to increasing the information quality. Blocking fake news and sites according to the regulations also requires technical advancements, which is the most important research task. Strategy for fake news intervention can be network-based [148]. It requires breaking the propagation paths responsible for spreading fake news. From the user point-of-view, fake news intervention mainly depends on specific roles users play in fake news activities.
• One of such roles is the influential role. By taking into consideration these influential people, we can easily intervene in the fake news efficiently.
• Another role is the corrector, who finds the fake news by correcting it in the comment sections or posts.
• Malicious users, who spread the fake news regularly, must be penalized. In future, these interventions are to put into considerations using high technologies of AI strictly.

VI. CONCLUSION AND DISCUSSIONS
Social media has become pervasive and more prevalent in recent years. People now prefer to read news more from social media platforms than traditional mainstream news channels. This led to an increase in the dissemination of fake news in social media, as it is much easier to share information on social media without any verification. The adverse impact of fake news is also dangerously increasing, like its impact on the 2016 US election. This can harm people lives. This paper presented a comprehensive, analytical, and evidential survey covering all AI techniques like supervised learning, unsupervised learning, semi-supervised learning, reinforcement learning, and ensemble learning for the fake news detection by overcoming the limitations of the existing state-of-the-art surveys. We implemented four state-of-the-art AI techniques for fake news classification for brevity, namely LSTM, NB, passive-aggressive classifier, and RF. The discussion of how to optimally design hyperparameters is also carried out in each implemented algorithm. At last, some key suggestions from the proposed model is represented, along with the challenges and future scope in this direction. Below are some proposed insights for fake news classification using the techniques discussed in this paper.
• For the detection of fake news, a subset of articles classified as fake by NB (using TF-IDF vectorizer) can be made from the original dataset. This subset will cover almost all the fake news articles as this technique has a very high recall (0.9786).
• Fake news can be detected accurately in this new subsetted dataset using the combination of techniques like LSTM and passive-aggressive classifiers as they have very good precision in classifying fake news. Although much research has been done in fake news detection, the research on Fake News Classification techniques to handle fake news is in its early stages. However, these classification techniques will significantly improve the potential to tackle fake news. We believe our timely study will shed valuable light on the fake news classification techniques and motivate the researchers and practitioners to add their valuable efforts into this promising area.
HARSHAL SHETHNA is currently pursuing the bachelor's degree with Nirma University, Ahmedabad, India. His research interests include computer vision, natural language processing, energy-based models, reinforcement learning.
KEYUR PATEL is currently pursuing the bachelor's degree with Nirma University, Ahmedabad, India. His research interests include artificial intelligence, deep learning, the IoT, and network security.
URVISH THAKKER is currently pursuing the bachelor's degree with Nirma University, Ahmedabad, India. His research interests include blockchain, machine learning, and reinforcement learning.

WEI-CHIANG HONG (Senior Member, IEEE)
is currently a Professor with the Department of Information Management, Asia Eastern University of Science and Technology, New Taipei, Taiwan. His research interests mainly include computational intelligence (neural networks and evolutionary computation), and application of forecasting technology (ARIMA, support vector regression, and chaos theory), and machine learning algorithms. He serves as the program committee for various international conferences including premium ones such as IEEE CEC, IEEE CIS, IEEE ICNSC, IEEE SMC, IEEE CASE, and IEEE SMCia. In May 2012, his article had been evaluated as Top Cited Article 2007-2011 by Elsevier Publisher (The Netherlands). In September 2012, once again, his article had been indexed in ISI Essential Science Indicator database as Highly Cited Articles, in the meanwhile, he also had been awarded as the Model Teacher Award by Taiwan Private Education Association. He is indexed in the list of Who's Who in the World (25th-30th Editions), Who's Who in Asia (2nd Edition), and Who's Who in Science and Engineering (10th and 11th Editions). He has Google scholar citations 5424, H-index 38, and i-10-index 64 in his account. He is a Senior Member of IIE. He is currently appointed as the Editor-in-Chief of the International Journal of Applied Evolutionary Computation, in addition, he serves as a Guest Editor for the Energies and is appointed as an Associate Editor of Neurocomputing, Forecasting, and the International Journal of System Dynamics Applications.
RAVI SHARMA is currently working as a Professor with the Centre for Inter-Disciplinary Research and Innovation, University of Petroleum and Energy Studies, Dehradun, India. He is passionate in the field of business analytics and worked in various MNC's as a Leader of various software development groups. He has contributed various articles in the area of business analytics, prototype building for startup, and artificial intelligence. He is leading academic institutions as a consultant to uplift research activities in inter-disciplinary domains.