Utilizing Ensemble Learning for Detecting Multi-Modal Fake News

The spread of fake news has become a critical problem in recent years due extensive use of social media platforms. False stories can go viral quickly, reaching millions of people before they can be mocked, i.e., a false story claiming that a celebrity has died when he/she is still alive. Therefore, detecting fake news is essential for maintaining the integrity of information and controlling misinformation, social and political polarization, media ethics, and security threats. From this perspective, we propose an ensemble learning-based detection of multi-modal fake news. First, it exploits a publicly available dataset Fakeddit consisting of over 1 million samples of fake news. Next, it leverages Natural Language Processing (NLP) techniques for preprocessing textual information of news. Then, it gauges the sentiment from the text of each news. After that, it generates embeddings for text and images of the corresponding news by leveraging Visual Bidirectional Encoder Representations from Transformers (V-BERT), respectively. Finally, it passes the embeddings to the deep learning ensemble model for training and testing. The 10-fold evaluation technique is used to check the performance of the proposed approach. The evaluation results are significant and outperform the state-of-the-art approaches with the performance improvement of 12.57%, 9.70%, 18.15%, 12.58%, 0.10, and 3.07 in accuracy, precision, recall, F1-score, Matthews Correlation Coefficient (MCC), and Odds Ratio (OR), respectively.


I. INTRODUCTION
The concept of fake news is not new.Its roots existed long ago in our society.It refers to false information which can be disseminated to mislead or deceive the Public.For example, fake news about COVID-19 vaccines could discourage people from getting vaccinated, leading to increased rates of illness and death.In the past, every kind of distinct material was considered fake news, like satires, conspiracies, news manipulation, and click-bait.However, fake news is now becoming jargon [1] and has a huge impact on the critical The associate editor coordinating the review of this manuscript and approving it for publication was Donato Impedovo .events happening in our society, e.g., spreading fake news (false stories) on social media was very concerning in US presidential election 2016 [2].
Fake news can spread quickly through social media and other online platforms.It can have serious consequences, such as causing panic, influencing elections, and eroding public trust in legitimate news sources.Individuals need to distinguish real news and critically evaluate sources of information before sharing or responding to them.Additionally, news organizations and social media platforms are responsible for combating the spread of fake news by fact-checking and removing false content.The surveys show that about 70% of Americans use social media as a source of news and circulating information [3].The accessibility of news and information on the Internet is very low-cost and convenient.However, spreading fake news on these carriers is straightforward and effortless [4].Fake news can lead to false assumptions that drastically affect our society.Consequently, it is critical to design an automated fake news detection system.
Many researchers are actively developing new and better methods for identifying and combating the spread of misinformation.Some of the key research areas and trends in this field include deep learning approaches, e.g., Convolutional Neural Network (CNN); linguistic features, e.g., sentiment analysis, topic modeling, and stylometric analysis; sourcebased approaches, e.g., analyzing the domain name, social media presence, or history of the news source, and ensemble approaches, e.g., combining linguistic, source-based, and deep learning models to create a more robust and accurate detection system.Although recent research has identified the issues of the said problem and proposed different solutions, e.g., pre-trained language models have shown their effectiveness in alleviating feature engineering efforts, such as Bidirectional Encoder Representations from Transformers (BERT) [5], OpenAI GPT [6], and Elmo [7], however; the problem requires significant performance improvement.
From this perspective, this paper proposes an ensemble learning-based detection of multi-modal fake news (ELD-FN).It first exploits a publicly available dataset Fakeddit, a novel multi-modal dataset consisting of over 1 million samples from multiple categories of fake news.Second, it leverages Natural Language Processing (NLP) techniques for preprocessing textual information of news.Third, it gauges the sentiment from the text of each news.Fourth, it generates embeddings for text and images of the corresponding news by leveraging V-BERT [8], respectively.Finally, it passes the embeddings to the deep learning ensemble model for training and testing.The 10-fold evaluation technique is used to check the performance of ELD-FN.The evaluation results are significant and outperform the state-of-the-art approaches with the performance improvement of 12.57%, 9.70%, 18.15%, 12.58%, 0.10, and 3.07 in accuracy, precision, recall, F1-score, Matthews Correlation Coefficient (MCC), and Odds Ratio (OR), respectively.
The main contributions made in this paper are as follows.
• The proposed approach integrates news sentiment as a crucial feature and employs ensemble learning to identify multi-modal fake news.
• It is evident from the evaluation results that ELD-FN is significant and outperforms the baseline approaches with the performance improvement of 12.57%, 9.70%, 18.15%, 12.58%, 0.10, and 3.07 in accuracy, precision, recall, F1-score, MCC, and OR, respectively.
The organization of the rest of the paper is as follows.Section III describes the details of ELD-FN.Section IV describes the evaluation methods for ELD-FN, obtained results, and their threats to validity.Section II discusses the research background.Section V summarizes the paper and suggests future work.
Most of the state-of-the-art fake news classification approaches can be categorized as follows: 1) fake news classification approaches for single-modality and 2) fake news classification approaches for multi-modality.

A. FAKE NEWS CLASSIFICATION APPROACHES FOR SINGLE-MODALITY
The fake news classification approaches for single-modality can be further divided into two categories based on the text/image features.

1) SINGLE-MODALITY BASED CLASSIFICATION APPROACHES USING TEXTUAL FEATURES
Textual features can be divided into generic and latent categories.Usually, traditional machine learning algorithms utilize Generic textual features.These algorithms analyze text based on linguistic levels such as lexicon, syntax, discourse, and semantics.Previous research has compiled a detailed table summarizing these features [10].However, Latent textual features consist of the embeddings extracted from textual data of news at the word, sentence, or document level.Latent vectors are constructed from the textual news data.Furthermore, these latent vectors are used as input for classifiers, i.e., SVM.
Recurrent neural networks (RNNs) are potent in modeling and analyzing sequential data.For example, Ma et al. used RNNs to capture relevant information over time by learning hidden layer representations [11].Meanwhile, Chen et al. proposed a CNN-based approach for the classification [12].Moreover, a novel technique Attention-Residual Network (ARC) is introduced to acquire long-range features.Ma et al. introduced a Generative Adversarial Network (GAN)-based model that employs a Generator network based on Gated Recurrent Units (GRU) to generate contentious instances.Furthermore, a Discriminator network based on RNNs is designed to identify essential features [13].
RNN-based models have proven very effective in classifying fake news detection datasets.However, the RNN-based models prioritize the recent input sequence, and the essential features may be located at the end of the sequence.Yu et al. proposed a CNN-based approach that resolves this issue.The proposed technique does not prioritize recent input sequences.This approach applies feature extraction based on the relationship of the essential features [14].Vaibhav and Hovy utilize a graphical approach for classifying news articles [15].For this purpose, they used Graph Neural Networks, such as Graph Convolutional Networks (GCN) and Graph Attention Networks (GAT), to create graph embeddings for fake news detection.
Wu et al. utilize multi-task learning techniques to classify and detect fake news.Moreover, the stance classification task optimizes shared layers concurrently, improving news representations [16].Cheng et al. utilized LSTM model to classify the textual news data.They used a variational autoencoder to extract essential textual features at the tweetlevel text.Some researchers have assumed that complex and multi-dimensional news are not accessible initially.The accessibility of only text-based news depends on the popularity [17].Qian et al. developed a text-based model that utilizes word/sentence level data from legitimate papers to produce user feedback for early detection [18].This addressed the scarcity of user reviews as an auxiliary source of information.For example, Qian et al. proposed an approach for generating user feedback on the text.Such feedback was along with word/sentence level information from real articles for the classification process [18].Giachanou et al. investigated the influence of emotional cues in the proposed model.They propose an LSTM model that integrates emotional signals extracted from claim texts to differentiate between true and false news [19].

2) SINGLE-MODALITY BASED CLASSIFICATION APPROACHES USING IMAGE FEATURES
As multimedia becomes more prevalent in social networks, news now contains text and visual information such as images and videos that convey rich meaning.However, textual feature-based approaches face challenges in effectively capturing visual information because of the heterogeneity between text and image data.Consequently, many researchers have proposed image-based approaches for detecting fake news.
Classical image-based models utilized basic fundamental numerical features of images [20], [26], such as image count, popularity [27], and type to identify fake news.For impaired images, complex forensics features were extracted.Furthermore, post and user-based features are integrated to identify fake news [28].However, it was evident that basic numerical features are inadequate to describe complex visual information of the news images.
Deep learning models such as CNNs have proven effective in capturing visual features in news images.Many researches have shown that feature extraction from CNN models can be used in visual recognition tasks to generate generic image representation [29].
Building on the success of CNNs, recent studies have utilized pre-trained deep CNNs like VGG19 [30], [31] to obtain generic visual representations [32], [33].Researchers suggested multi-domain visual neural models to capture the inherent traits of fabricated news images more effectively.These multi-domain models merged frequency and pixel domain visual data to differentiate between genuine and fabricated news based on visual characteristics [34].Poor quality is a common trait in fake news images.The poor quality feature and image semantics are visible in frequency and pixel domains.However, the quality feature is extracted by CNN model, and the semantics of the images are extracted by CNN-RNN model.

B. FAKE NEWS CLASSIFICATION APPROACHES FOR MULTI-MODALITY
Word-based and Image-based information are both important in detecting fake news.As social networks often contain both types of information, combining them can improve performance.This section discusses the different multi-modal approaches for fake news detection, categorized based on the different perspectives they adopt.

1) PROBLEMS IN MULTI-MODALITY
Several studies have explored using visual information to complement textual information in detecting fake news.These studies typically use text-based and image-based encoders to extract textual and visual features, respectively.Furthermore, these feature vectors construct an overall feature vector for each news.For example, Wang et al. proposed Event classification as an additional task to enhance the generalizing ability of the model for event-invariant multimodal features [32].Other researchers, such as Singhal et al., use a combination of text-based and image-based features.They utilize BERT and XLNet pre-trained models for encoding text-based and image-based data, respectively [35].However, these approaches are proven to be limited in effectively detecting multi-modal fake news because of their ability to capture complex cross-modal correlations.More advanced multi-modal techniques are needed to improve the performance of fake news detection.

2) FLEXIBILITY IN MULTI-MODALITY
Some studies have recognized that irrelevant images are a common characteristic of multi-modal fake news and have focused on measuring the consistency between the text and visual components in detection.One approach by Zhou and Zafarani [36] used an image captioning model to generate sentences from images and then measured the similarity between those sentences and the original text.However, this approach was constrained by the discrepancies that existed between the training data of the image captioning model and the real news corpus.Another approach by Xue et al. projected the visual and textual features into a shared feature space and computed the similarities between resulting multi-modal features.However, they encountered difficulties capturing multi-modal inconsistencies because of the semantic gap between the two types of features [37].
Ghorbanpour et al. [38] proposed the Fake-News-Revealer (FNR) method, which uses a Vision-transformer [39] and BERT [5] to extract image and text features respectively.The model extracted textual and visual features separately and determined their similarities by loss.

3) IMPROVEMENT IN MULTI-MODALITY
Several researchers have proposed different approaches for fake news detection using multi-modal data.Jin et al. utilized an RNN model and applied an attention mechanism to combine information extracted from textual, visual, and social context data [40].Zhang et al. [41] used a multi-channel CNN with an attention mechanism to combine multi-modal information, while Song et al. [42] proposed the co-attention transformer to model the bidirectional enhancement between images and text.Qian et al. developed a Hierarchical Multi-modal Contextual Attention Network (HMCAN), which was designed to collectively capture multi-modal context data and the hierarchical semantics of text [43].Wu et al. introduced the Multi-modal Co-Attention Network (MCAN) that extracts spatial-domain and frequency-domain features from the image and text, and fuses visual and textual features using multiple co-attention layers [44].Other researchers have also utilized Graph Convolutional Networks (GCN) and entity-centric cross-modal interaction to model the relationship between word-based and imagebased objects.Finally, Zhang et al. and Laura et al. proposed a BERT-based multi-modal model to encode text-based and image-based information.The model effectively captures the interplay between text and images and employs contrastive learning to enhance multi-modal representations.[24], [45] integrated visual entities to enhance the comprehension of high-level semantics in news images and to model the inconsistencies and mutual enhancements of multi-modal entities [22].
In summary, when performing multi-modal fake news detection, there are three important inductive biases to consider when examining text-image correlations.Firstly, images provide additional information to the text, highlighting the need for multi-modal.Secondly, problems between text and images can serve as a potential signal for detecting fake news using multiple modalities.Finally, text-based and image-based data can improve performance by identifying essential features.

III. METHODOLOGY A. OVERVIEW
The overview of ELD-FN is depicted in Fig. 1.The following are the main steps of ELD-FN.
3) Then, it computes the sentiment from the text of each news.4) After that, it generates embeddings for text and images of the corresponding news by leveraging V-BERT, respectively.5) Finally, it passes the embeddings to the deep learning ensemble model for training and testing.

B. PROBLEM DEFINITION
A news n from a set of multi-modal dataset of news N can be represented as follows: where, t is the textual information of n, i is the image of n, and s is an assigned status to n whether n is fake or true.
The ELD-FN suggests the status of new news as either ture or false, where ture represents that the news is real and false represents that the news is fake.Consequently, the automatic classification of a new news n could be defined a mapping f : where, c is a suggested status from a news status set (ture, false).

C. PREPROCESSING
The news may contain inappropriate and unnecessary text, e.g., English stop-words.Such information is considered an overhead for the machine learning classification algorithms because of processing time and memory utilization.Therefore, preprocessing of news text is essential for the performance of ELD-FN to make it fast and memory efficient.We perform the following preprocessing steps to clean the text of news.

1) TOKENIZATION
Text tokenization breaks down a piece of text into smaller units called tokens.Tokens are individual words, phrases, or other meaningful text elements, which can be analyzed and processed further.

2) SPECIAL CHARACTER REMOVAL
The text of news may contain special characters, e.g., semicolon (;).This step removes the special characters from the list of tokens.

3) STOP-WORD REMOVAL
English text contains meaningless words that are used to make sentences meaningful, called stop-words.This step removes stop-words from the working list.

4) SPELL CORRECTION AND LOWERCASE CONVERSION
This step identifies and corrects the spelling mistakes from the working list of tokens of news.

5) LEMMATIZATION
The lemmatization step converts higher-degree and comparative words into their lower-degree words, e.g., lemmatization converts the word darker into dark.
We exploit Python Natural Toolkit (NLTK) 2 for the preprocessing of news.The preprocessed news can be represented as follows: where, t ′ = t 1 , t 2 , . . ., t n are the tokens from the text of n after preprocessing.

D. SENTIMENT ANALYSIS
Sentiment analysis is a NLP technique that involves identifying and extracting subjective information from text, i.e., opinions, attitudes, emotions, and sentiments towards a particular topic.It automatically classifies the polarity of a text as positive, negative, or neutral.We exploit TextBlob API 3 for the computation of sentiment analysis.The news (mentioned in Eq. e3) after sentiment computation can be represented as follows: where, v is the sentiment of n ′ .

E. FEATURE MODELING
This step passes the preprocessed text and images from the multi-modal dataset to V-BERT to generate the embeddings.V-BERT is an extension of the BERT model that combines the power of the BERT model with a visual grounding mechanism, allowing it to understand the relationship between the text and the visual information in an image.This is achieved by combining a region-based visual feature extractor with the BERT model, where each image region is encoded into a vector using a CNN.These visual features are concatenated with the input text, and the resulting sequence is fed into the BERT model.During training, V-BERT is optimized to minimize a joint loss function.This allows Visual BERT to learn language and vision representations in a unified framework and capture the complex interactions between the two modalities.The layers/steps involved in ELD-FN for identifying fake/real news.

1) BERT SHARED LAYER
For the news text, the BERT shared layer is implemented using a pre-trained Seq2Seq model [8].The fine-tuning learning process is required and indispensable to achieve better results.To improve its efficiency, separate BERT-shared layers are adopted for model-to-model textual features.The output of news text feature extractor O T BERT can be represented as follows: where, BERT T is the relevant BERT-shared layer modeling for news text and X T is the input representation of textual data.

2) IMAGE EMBEDDING LAYER
For the news image, Faster-RCNN model [8] is applied to extract features from the image.The detected objects may provide visual contexts of the whole picture and be linked to specific terms through detailed region details.We also add a position embedding feature to images by encoding the object location.The output of the image feature extractor O T BERT can be represented as follow: where, BERT I is the relevant BERT-shared layer modeling for images, and X I is the input representation of images.

3) PRE-FEATURE EXTRACTION
The BERT-shared layer is strong enough for feature extraction.It includes a pre-feature extractor to enhance the ability of BERT to learn semantic characteristics.Prefeature extractor consists of the Position-wise Convolution Transformation (PCT) and the Multi-Head Self-Attention (MSA) layer.

4) MULTI-MODAL FEATURE CONCATENATION
After extracting the latent features of text and image, these are concatenated together to obtain the desired multi-modal feature representations.The multi-modal concatenated features O f can be represented as follows: F. ENSEMBLE MODEL Bagging and boosting [46] are two approaches to ensemble machine learning models.We applied both approaches with CNN and LSTM models.Four different architectures (bagged CNN, bagged LSTM, boosted CNN, boosted LSTM) of ensemble machine learning models have experimented using bagging and bootstrap aggregating to predict the fake/real news.Note that bagged CNN is the proposed ensemble model as it yields the other mentioned ensemble architectures.The predictions through different architectures are made using Algorithm 1.

IV. EVALUATION
This section constructs the research questions to evaluate ELD-FN, explains the exploited dataset, defines the metrics and evaluation process, and reports the findings and threats to validity.

A. RESEARCH QUESTIONS (RQs)
The following research questions are investigated to evaluate ELD-FN.
• RQ2: Does news sentiment influence the identification of fake news?
• RQ3: Does preprocessing influence the identification of fake news?
• RQ4: Does ELD-FN outperform other classifiers regarding identifying fake news?The RQ1 compares the ELD-FN with the baseline approaches [24], [25] names as FakeNED and MultiFND in the rest of this paper.The reason to select these approaches as baseline approaches is that both are recently proposed approaches, closely related to our work and exploited the same dataset.
The RQ2 investigates the influence of news sentiment to detect fake news.It evaluates whether positive news will likely be considered true or vice versa.The description of the exploited dataset of fake news Fakeddit is presented in Table 1 which is public (available online 4 ).Nakamura et al. [47] collected the data from a social news and discussion website Reddit.It consists of over 1 million pieces of news (1,063,106) from 22 subreddits.It is classified in three different ways: 2-way, 3-way, and 6-way.The dataset samples with 6-way classification are represented in Fig. 2. Out of the total samples, 59.12% (628,501) and 40.48% (527,049) are fake and real news, correspondingly.However, only 64.25% (682,966) samples are multi-modal.Note that we only use the multi-modal data samples with 2-way classification to evaluate the proposed approach.Moreover, Fig. 3 and Fig. 4 represent the wordcloud (most common words in the dataset) and frequency of the words, respectively.

C. PROCESS
This section explains the evaluation process of ELD-FN.After performing the preprocessing and feature modeling as mentioned in Section III, a 10-fold cross-validation technique is applied to train and test ELD-FN.The reason for considering 10-fold cross-validation is that it helps avoid data biasness and reduces the variance in performance estimation 4 https://github.com/entitize/fakeddit,accessed on 15-01-2023.
15042 VOLUME 12, 2024 Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.that might be observed with a single train-test split [48].The dataset's total multi-modal news N are broken down into ten (10) slices C i , where i = 1, 2, . . ., 10.For each cross-validation, the slices of N are selected that are not from C i as training samples (N t ) and news from C i as testing samples (N v ).
A bit-by-bit evaluation process for i th cross-validation is as follows: 1) All news N t from N but C i are extracted and combined; 2) an ensemble deep learning classifier is trained on N t ; 3) a CNN classifier is trained on N t ; 4) a LSTM classifier is trained on N t ; 5) baseline classifiers are trained on N t ; 6) we predict whether each news from the testing samples C i is real or fake; and 7) the below-mentioned evaluation metrics are computed for each classifier.

D. METRICS
We train and test the deep learning classifiers to evaluate the performance of ELD-FN.We select the most accepted metrics (accuracy, precision, recall, and f1-score) for this purpose.Furthermore, we compute the MCC and OR to check the effectiveness of the classifiers.The selected metrics be presented as follows: where, TP and TN are the numbers of correctly predicted news as real and fake, respectively.Similarly, FP and FN are the numbers of incorrectly predicted news as real and fake, respectively.The f1-score distribution of cross-validation for ELD-FN, FakeNED, and MultiFND are presented in Fig. 6.A beanplot is a visualization that displays a continuous variable's distribution across different groups.The beanplot compares the f1-score distributions by plotting one bean for each approach.Across a bean, the width of the bean represents the density of the data, with wider beans indicating higher density.
The following observations are made from Table 2, Fig. 5, and Fig. 6.
• ELD-FN has the accuracy (88.83%) and highest precision (93.54%), indicating that it has the highest percentage of correctly classified instances and true positive instances.
• ELD-FN has the highest recall (90.29%) and F1-score (91.89%), indicating that it has the highest ability to correctly identify positive instances and achieve a balance between precision and recall.
• ELD-FN also has the highest MCC (0.49) and OR (17.02), indicating a better correlation between predicted and actual classifications and higher odds of event occurrence than FakeNED and MultiFND.The average results of MCC (0.49 > 0.45 > 0.39) > 0 and OR (17.02 > 15.78 > 13.95) > 1 are true for ELD-FN and confirm its effectiveness.
• The minimum f1-score of ELD-FN is higher than the maximum f1-scores of FakeNED and MultiFND (shown in Fig. 6).
To validate the significant difference in the means of performance (f1-score) for all iterations of ELD-FN, Fak-eNED, and MultiNED, we perform a single-factor Analysis of Variance (ANOVA).ANOVA is a statistical method used to test whether there is a significant difference in the means of three or more independent groups or samples.It is conducted on Excell with its default settings and presented in Fig. 7.It suggests that F > F cric and p-value < (α = 0.05) are true for f1-score, and the factor (using different approaches) significantly differs in f1-score.
Moreover, we utilize two re-sampling methods, oversampling and under-sampling to tackle the class imbalance within the dataset.Over-sampling involves generating additional samples for the minority class through Ran-domOverSampler, while under-sampling entails removing surplus records from the majority class in imbalanced datasets using RandomUnderSampler.The findings reveal that employing under-sampling results in accuracy, precision, recall, and F1-score values of 86.12%, 92.54%, 88.76%, and 90.61%, respectively.However, it's important to note that under-sampling diminishes the number of majority class samples, leading to a loss of information.Consequently, the performance of both majority and minority classes in the fine-tuned BERT model declines when under-sampling is applied.Likewise, utilizing the over-sampling technique yields accuracy, precision, recall, and F1-score values of 90.26%, 94.37%, 91.88%, and 93.11%, respectively.This enhancement is attributed to BERT being exposed to a larger dataset, enabling it to learn meaningful patterns more effectively.
The preceding analysis concluded that ELD-FN outperforms the baseline approaches in detecting fake news.
From Table 3 and Fig. 8, it is observed that Disabling sentiment (i.e., textual features only) brings out the significant difference in precision from 93.54% to 90.38% and f1-score from 91.89% to 90.17%.However, MCC and OR remain the same.
Table 5 represents the relationship between sentiment and news.It presents that 65.84% of negative news are real, whereas only 34.16% of the positive news are real.However, 73.71% of negative news are fake, whereas only 26.29% of the positive news are fake.It means the possibility of spreading fake news is 180.37% = (73.71%-26.29%) / 26.29%, if the news is negative.For example, if a fake news article portrays a political figure negatively, it can contribute to a negative sentiment towards that figure among the public and will propagate quickly.
15044 VOLUME 12, 2024 Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.The preceding analysis concluded that sentiment and features are critical for detecting fake news and disabling either would significantly reduce the performance of ELD-FN.
From Table 4 and Fig. 9, it is observed that disabling preprocessing brings out the significant difference in accuracy from 88.83% to 88.12%, precision from 93.54% to 92.95%, recall from 90.29 to 90.11, and f1-score from 91.89% to 90.50%.However, MCC and OR remain the same.
The preceding analysis concluded that text preprocessing and features are critical for detecting fake news and disabling either would significantly reduce the performance of ELD-FN.

4) RQ4: COMPARISON OF ELD-FN AGAINST OTHER CLASSIFIERS
We select off-the-shelf deep learning classifiers (CNN and LSTM), the most widely adopted and well-known.Note that the preprocessed text, their sentiment, and feature embeddings are given as input to the selected classifiers for comparative analysis.We set hyper-parameters' values as dropout = 0.2, recurrent_dropout = 0.2, loss function = binary-crossentropy, and activation = sigmoid for ELD-FN and both baseline approaches.
• ELD-FN performs better than LSTM because LSTM requires short text and performs sequential processing, which is unnecessary in our case.In contrast, CNN is proven efficient for long text and works better to extract local invariant features.
The preceding analysis concluded that ELD-FN outperforms other classifiers in detecting fake news.

F. THREATS TO VALIDITY
The probability of incorrect labeling of news is the first threat to construct validity.This research assumes that the labels by Nakamura et al. [47] are correct.However, incorrect labeling of data may cause the productivity of ELD-FN.
The choice of assessment metrics of ELD-FN is another threat to construct validity.The chosen metrics for detecting news are the most accepted in the literature for the classification task.
The choice of the sentiment analysis repository is the first threat to internal validity.The chosen repository III-E has been public and has good results in computing sentiment.Exploiting other repositories may cause the productivity of ELD-FN.
ELD-FN, FakeNED, and MultiFND coding is the second threat to internal validity.The coding and the produced results of ELD-FN, FakeNED, and MultiFND are verified to mitigate the threat.However, unknown errors may cause the productivity of ELD-FN.
The hyper-parameters setting of ELD-FN is the third threat to internal validity.The hyper-parameters setting for ELD-FN  is mentioned in Section IV-E4.The change in settings may cause the productivity of ELD-FN.

V. CONCLUSION AND FUTURE WORK
Automatic fake news detection is crucial to avoid spreading false information that can have serious consequences, ranging from reputational damage to social and political unrest.In some cases, fake news can even incite violence and lead to harm or loss of life.Therefore, the ability to automatically identify and flag false information can help mitigate the threats of fake news.From this perspective, this paper proposes an ensemble deep learning-based detection of fake news.The proposed approach leverages NLP techniques for preprocessing textual information of news, computes the sentiment from the text of each news, generates embeddings for text and images of the corresponding news by leveraging V-BERT, and passes the embeddings to the deep learning ensemble model for training and testing.The evaluation results significantly outperform the state-of-theart approaches in identifying fake news.
In future, we would like to investigate the need to adapt detection algorithms to new types of media.Fake news is not limited to text-based content, and algorithms must be able to detect false information in images, videos, and audio as well.Moreover, we are interested in improving the interpretability of detection algorithms.Current methods often rely on opaque deep learning models, making it difficult to understand how decisions are being made.Future work could focus on developing more transparent models or tools that help users understand how algorithms arrive at their conclusions.
end procedure where, X tT +1 is the feature set at time instances, of ensembled bagged or boosted models, ŷtT +1 is the output of the ensembled model, X is the feature set, Y is the instance of the output, α is the activation function, and W g b b are Weights of bagging or boosting models.The RQ3 examines the impact of preprocessing the news text to detect fake news.The RQ4 investigates the impact of different deep-learning classification algorithms on ELD-FN.We analyze the ELD-FN and other deep learning approaches to evaluate the performance of ELD-FN B. DATASET

FIGURE 3 .
FIGURE 3. Word cloud -most common words in details.

FIGURE 5 .
FIGURE 5. Performance of ELD-FN and baseline approaches.

TABLE 2 .
Performance of ELD-FN and baseline approaches.

TABLE 3 .
Influence of sentiment on ELD-FN.

TABLE 4 .
Influence of preprocessing on ELD-FN.

TABLE 5 .
Relation between sentiment and news.