HANN: Hybrid Attention Neural Network for Detecting Covid-19 Related Rumors

In the age of social media, the spread of rumors is becoming easier due to the proliferation of communication and information dissemination platforms. Detecting rumors is a major problem with significant consequences for the economy, democracy, and public safety. Deep learning approaches were used to classify rumors and have yielded state-of-the-art results. Nevertheless, the majority of techniques do not attempt to explain why or how decisions are made. This paper introduces a hybrid attention neural network (HANN) to identify rumors from social media. The advantage of HANN is that it will allow the main user to capture the relative and important features between different classes as well as provide an explanation of the model’s decisions. Two deep neural networks are included in the proposal: CNNs and Bidirectional Long Short Term Memory (Bi-LSTM) networks with attention modules. In this paper, the model is trained using a benchmark dataset containing 3612 distinct tweets crawled from Twitter include several types of rumors related to COVID-19. Each subset of data has a balanced label distribution with 1480 rumors tweets (46.87% ) and 1677 non-rumors tweets (53.12% ). The experimental results demonstrate that the new approach (HANN model) performs better results in terms of performance and accuracy (about 0.915% ) than many contemporary models (AraBERT, MARBEART, PCNN, LSTM, LSTM-PCNN and Attention LSTM). Moreover, a number of software engineering features such as followers, friends, and registration age are used to enhance the model’s accuracy.


I. INTRODUCTION
I N recent years, particularly social media has been proving to be a powerful tool for disseminating information at a faster rate than traditional networks. A rumor is an unconfirmed piece of text that spreads online and decreases trust in health authorities. Today's social media explosion has resulted in the spread of rumors that can threaten cyber security and social stability. During the Covid 19 pandemic, misinformation has spread rapidly through social networks and within communities [1,2,3]. The spread of COVID-19 has made individuals' not able to determine which information about the virus is trustworthy or not. Figure 1 shows a sample of rumors related to Covid 19 from ArCOV dataset [4].
Discovering rumors is an interesting and significant problem. A variety of existing work is devoted to identifying rumors from social media. Existing techniques can generally be divided into two groups: (i) classical machine learning techniques [5,6,7,8,9], (ii) deep learning-based techniques [10,11,12]. Support Vector Machines (SVM) is one of the most popular rumor-detection algorithms. Chang et al. applied unsupervised clustering-based techniques to detect rumors tweets related to US elections [6]. Decision trees are also used to detect rumors [8]. Social media rumor detection using deep learning methods has been shown to be effective. Wu et al. [10] developed a sparse learning strategy for selecting discriminative characteristics and training the classifier for emerging rumors. Ma et al. [11] recently proposed recurrent neural networks for rumor classification, using sequential data in order specifically to identify the temporal and textual aspects of the spread of rumor, which enables rumor detection to happen earlier and more accurately.
However, the main limitations of the current methods can be summarized in two points. First. when analyzing VOLUME 4, 2016 text for text classification, not every word in a sentence has the same importance. In other words, there are some words that are closely relevant to rumors, but others are irrelevant. The existing work considers that all words are significant when classifying texts as rumors or non-rumors. In the case of rumor detections, using attention techniques will help to know who generate a set of rumors and discover them easily. This mechanism is designed to ensure that important content receives more attention by assigning higher weights to certain keywords. Second, these deep learning approaches can classify or detect the information to rumors or non-rumors without explaining why the model reached these decisions. Interpreting the reasons behind the model' decisions is critical and can help individualize understand why the model made such decisions. Third, the existing deep learning approach only utilize the textual information. Other software engineering features such as followers, friends, and registration age can help to detect rumors and enhance the model accuracy.

FIGURE 1. SAMPLE OF ARABIC AND ENGLISH RUMORS IN TWITTERS
To fill the gaps, this paper introduces a hybrid attention neural network (HANN) to identify and explain rumors detection from social media. The advantage of HANN is that it will allow the end-user to not only captures the relative and important features between different classes but also it provides an explanation of the model's decisions. The proposed model combines two deep neural networks with attention mechanisms: a Convolutional Neural Network (CNN) and a Bidirectional Long Short Term Memory (Bi-LSTM) network. The advantages of the proposed HANN model over the existing ones are summarized as follows: • It provides a hybrid deep learning model to identify rumors from social media. • It captures the relative and important features between different classes with explanation of model's decisions. • The performance of new approach achieved about 0.915% better than state-of-art approaches (AraBERT, MARBEART, PCNN,LSTM, LSTM-PCNN and Attention LSTM) . • A number of software engineering features such as followers, friends, and registration age were used to enhance the model's accuracy. This study is organized as follows: Section 2 presents study literature. Section 3 and 4 discusses the system methodology. Section 5 presents data sets and system classification model and presents the evaluation results of the new model. Finally, section 6 focuses on conclusion and points out future directions.

II. RELATED WORK
There is substantial interest in the detection of rumors in several fields, including data mining, machine learning, and natural language processing (NLP). This section reviews existing methods of detecting rumors in text content published on social media. We will focus primarily on detecting rumors from Twitter messages. In general, the current studies fall into two groups: (i) traditional machinelearning techniques, and (ii) deep learning techniques. We discuss, In this section, we discuss the literature work on rumors detection briefly.

A. MACHINE LEARNING METHODS
Traditional machine learning methods have been applied for rumour detection. A real-time approach to detecting rumors on Twitter was used by Suchita Jain et al. (2016) [13] by analyzing sentiment and semantic. They used a verified News Channel accounts to classify rumors in real time. According to Mao et al. [14], social media sentiment analysis was the most effective method for detecting rumors. To detect rumors, they combined shallow statistical features with deep statistical features and sentiment analysis. A rulebased heuristic method was proposed by Sivasangari et al. [15] which computed the sentiment polarities of each text. In order to distinguish a rumor from genuine content, VADER was used to determine the sentiment score value for the text.
Researchers also use Support Vector Machines and sentiment analysis to detect rumors [16,17,18] . A sentimentbased hybrid kernel SVM (SHSVM) classifier designed by Li et al. has been developed for detecting rumors [16]. A dictionary of emotions was used to analyze sentiments in comments on social networks. Based on the work by Zhang et al., rumour detection was achieved by using shallow and implicit features [7]. They employed traditional machine learning approaches such as SVM (Support Vector Machine), Random Forest to classify tweets as rumors or nonrumors. Jin et al. [19] developed an approach for detecting the spread of rumors during 2016 U.S. elections. In addition, Word matching methods such as Word2Vec and Doc2Vec were used to identify tweets referring to two presidential candidates with verified rumor articles. In a study conducted by Alqurashi et al. [20], a dataset of COVID-19 fake news that was spread on Twitter in Arabic was examined in detail. The methods examined in this study are the random forest classifier, the XGB, the naive Bayes, SGD, and the SVM. Alsudias and Rayson employed SVM, LR, and NB classifiers to distinguish rumor tweets from non-rumors [21], They applied their model on Arabic dataset related to Covid 19. The best accuracy was achieved with LR using a count vector and SVM, which posted a result of 84.03%.

B. DEEP LEARNING METHODS
The use of deep learning methods has been proven to be effective on a variety of classification tasks. Unlike machine learning methods, deep learning approaches learn latent representations of input information to detect rumor from social media. Chen et al. [22] applied BERT model with TextCNN and TextRNN models to detect rumors. Training of the proposed model used data collected from 3737 rumors related to the COVD-19. According to the results, the proposed BERT model outperformed all other methods. Long Short-Term Memory (LSTM) is a popular algorithm for searching patterns in longer sequences [23]. LSTM holds the connection between various words in these sequences. Al-Sarem et al. [24] proposed a hybrid deep learning model. The proposed model uses a Long Short-Term Memory (LSTM) and Concatenated Parallel Convolutional Neural Networks (PCNN) to detect COVID-19-related rumors from Twitter data. Chen et al. [25] introduced a hybrid model for Cantonese rumors classification on Twitter. Bi-Directional Graph Convolutional Networks is proposed by Bain et al. to explore characteristics of rumors [26]. It leverages a GCN to learn the patterns of rumor propagation. a CNN+RNN model has been also introduced to detect fake Using user characteristics to create a feature vector, modeling five minutes of tweets into a time interval [27,28,29,30]. Other researchers combined convolutional neural networks (CNN) and long short-term memory (LSTM) to detect rumors based on the relationship between user and textual information [24]. As part of detecting rumors in the comments, CNN-LSTM models were used, and sentiment was taken into consideration [31].
To sum up, previous researchers focused on classifying or detecting the information to rumors or non-rumors and could not explainable rumors detection. To the best of our knowledge, this is the first research to utilize attention mechanism address to detect rumours from Arabic content. Additionally, all studies do not take into account other factors like followers, friends, or age when detecting rumors in social media. To overcome these lacks, in this paper we suggest a hybrid attention neural network to identify and explain rumors detection from social media. Further enhancements are made to the model by leveraging software engineering features.

III. METHODOLOGY
This paper introduces a hybrid attention neural network (HANN) to identify rumors from social media. The advantage of HANN is that it captures the relative and important features between different classes as well as it provides an The model extracts text features by using the CNN and then combines that with the Bi-LSTM and the attention mechanism for rumors detection. Below, we explain each step in detail.

A. INPUT LAYER
The pre-processing of data involves performing basic operations on the dataset before it is passed to HANN. In this step, raw data is transformed into an organized and useful representation that can be used for further analysis. As part of our model, the preprocessing step aims to eliminate noise , and enhance rumor prediction. The main pre-processing consists of the following steps as shown in Figure 2: • Eliminate whitespaces and repetitive words.

B. EMBEDDING LAYER
In natural language processing, many feature extraction techniques have been proposed to determine the association and relationship between words. Bag of words and TF-IDF are statistical techniques used to determine the mathematical significance of words in documents. TF-IDF method was used in previous studies to analysis documents and find the significance of words in documents [18,32,33]. However, a number of studies used embedding techniques and shown remarkable results. Embedding techniques can capture word associations and improve prediction accuracy [24,26,29]. Word embedding is one of the most popular techniques to represent text vocabulary. This techniques can detect the context of a word within a document, as well as its semantic and syntactic similarity, and its relationship to other words. For learning to embed words from raw text, various models have been proposed, such as GloVe [34], Word2Vec [35]  and FastText. The layer can be used to learn and store embedding of words that can be used in the next layer. Based on the experiments, GloVe word embedding produced the best results. In this paper, we employed pre-trained GloVe model for learning feature representations.

C. CNN LAYER
A CNN network is a type of deep learning network that produces excellent results and has been applied in several text classification takes. The goal of CCN layer is to extracts semantic features from the input text and reduce the number of dimensions. Standard CNN architecture layers consist of 3 convolutional layers, 3 pooling layers, and 1 fully connected layer. Each convolution layer is composed of a plurality of convolution kernels that convolve the input with pooling layers, and its calculation is shown in equation (1). Pooling layers are used to minimize the dimensionality of data and control over-fitting.
After applying different convolution layers, Several features are extracted from the data, but the extracted dimensions are very high. In order to reduce the feature dimension , a global max-pooling is applied at the end of each layer, and it produces the global best results from the entire network. o where o t is the output value after convolution, f is a RELU activation function which mathematically expressed as f (x) = max(0, x). x t represents the input vector, k t represents the weight of the convolution kernel, and b t is the bias of the convolution kernel.

D. BI-LSTM LAYER
The Bi-LSTMs can process input sequences with variable length by two independent LSTMs (forward and backward). In this section, we offer a quick overview of stander LSTMs and bidirectional LSTMs.
Basic LSTM: where h t−1 is the last hidden state, x t is the current input, W * are the weighted matrix and b * represent the biases. σ is a sigmoid function and tanh is tangent function. refers to the element-wise product between two vectors.
Bi-LSTMs: Basic LSTM only able to remembers contextual information from only previous information [36]. However, bidirectional LSTMs (Bi-LSTMs) [37] solve this problem by using forward layer and backward layer to process the contextual information in both directions. The objective of bidirectional encoding is to encode rumor data in both directions. The output of the memory cell is calculated as follows: where − → h is the forward hidden state, ← − h is the backward hidden state and ⊕ donates the concatenation operation. The output layer of the model is then updated by concatenating − → h and ← − h . In this layer, the Adam optimizer was chosen, and the learning rate was set at 0.001.

E. ATTENTION LAYER
For classification of text, understanding the relation between the words is necessary so that the model can accurately label the text as "rumors" or "non-rumors". Words are obviously not equally important to a text representation; some features are more valuable than others. Instead of passing the hidden sequence to the next layer, we capture only relevant phrases using the attention mechanism. Employing attention mechanisms, we can interpret the significant of the different words, as well as the relative importance of the words themselves. Indeed, attention mechanisms emphasize important features by giving specific keywords a higher weighting. The attention weights of words are computed as follows: where W w is the weight matrices and b w denotes the bias term, υ T w is a transposed weight vector. a t is the normalized attention weights via softmax function, and a weighted sum of hidden representations is computed as s.

F. OUTPUT LAYER
The proposed model defines binary cross-entropy as a loss function for rumor detection. Afterwards, the attention layer's output is passed to the sigmoid layer with a value of either 0 or 1: where p is the predicted probability and y is the classification output where the predicted result with 0 representing non -rumor and 1 representing rumor.

IV. USERS' FEEDBACK TO OPTIMIZE HANN'S ACCURACY
Although a high accuracy of rumor detection can be achieved by HANN, a room for a higher accuracy is still possible through the use of users' explicit feedback (e.g., user's rating on posts as rumors or facts). The accuracy of machine learning systems can be significantly improved by working closely with users, which leads to a better understanding and trust from users [38,39,40]. Users can provide support to HANN model by giving their explicit collective feedback on the model's classification accuracy and detection of rumors. This collective feedback will then be used to better train and tune the HANN's overall accuracy. For example, users could give points or ratings on the accuracy of HANN's classified social media posts. Furthermore, users can provide or receive feedback from other users on posts that the model failed to classify as rumors or facts. Users' ratings can be used to judge a post's quality, which will be used to educate HANN about the difference between rumors and facts. HANN's detection of rumors will be improved as a result, but this can also help minimize the negative effects of rumors on users and enhance the quality of information shared and produced on social media platforms. It is, however, difficult to motivate users to give such feedback in a continuing manner, since the majority have little interest in doing so [41]. The concept of "gamification" is being used as a behavior-change tactic to increase a user's motivation towards desired behaviors (for example, giving feedback about HANN's detection of rumors) [42,43]. Among the common uses of gamification is taking the elements of a video game (e.g., points and levels) and applying them to a context that is not a game (e.g., an educational setting) [44].
Gamification has been successfully applied in several different environments, such as adopting a healthy lifestyle [45], Enhancing students' engagement with classes [46], increasing the quality and productivity in a business environment [47], etc. There are four common gamification elements in a non-game context [48]: • Points: A number of gamification elements are based on points (e.g., levels and leader boards). The quality of a user's post (or a HANN's classified post) is determined by ratings from other users in a social network. These gained or lost points will then be used to further train HANN to better distinguish rumor from facts. However, points should be used alongside other gamification elements to motivate users effectively [48]. • Digital Badges: Users can display these awards if they earn "any kind of skill, knowledge, or achievement" to show off their achievements [49] and have defined criteria [50,51,52]. Users may be able to collect digital badges, as an example, by accumulating a predetermined number of points based on the quality ratings, or by providing ratings on HANN's classification. • Levels: In order to achieve levels, users need to earn points. Once they have earned a certain (predetermined) number of points, they can level up (i.e., unlock more software/game features) [53]. • Leader Boards: Users can create leader boards based on their achievements or points earned, or based on their progress towards a goal [51].
According to a recent study [54], there are several factors that influence how users perceive and respond to gamification elements in the context of detecting rumors on Twitter platforms, including: • Privacy: The availability of a feature that allows users to rate each other's posts can be selected from a variety of privacy preferences. • Notification: Users have various preferences on the style of notification they would receive one their posts are being rated or when a HANN's classified post needs to be rated. For example, some users would prefer to be notified when only a negative rating is given on their posts. VOLUME 4, 2016 • Gamification elements for online rumors: The majority of users would always prefer to have the option to use or deactivate such gamification elements (e.g., points) and not feel pressured by them. • Social pressure: User relationships can negatively affect the objectivity of their ratings on posts when they are close to each other (e.g., a family member).
This sheds the light on the need to carefully collect users' explicit and collective feedback that can be used to optimize HANN model in a manner that better suits users' preferences. By failing to do so, users' feedback on others' posts (or HANN's classified posts) could suffer, potentially leading to a failure of the whole usage of users' feedback to improve HANN's accuracy. To reach this, we adopt the application-independent conceptual framework proposed by [54] to gamify users' feedback collection process on the accuracy of HANN's classification and the detection of rumors that the model could fail to classify as rumors or facts (See Figure 4). The framework encapsulates the previously discussed differences of users' perceptions and needs towards the use of gamification elements to motivate them to give quality feedback on HANN's accuracy. This will guide software engineers on how to encourage users to provide their explicit and collective feedback to be used to further train HANN and potentially increases its detection accuracy of rumors.

V. EXPERIMENTAL EVALUATION
This section examines our HANN model's performance using the ArCOV dataset [4]. We first describe the dataset used to evaluate our model. Based on the accuracy (Acc), sensitivity (Sen), specificity (Sep), and F1 score of our model, we measured its performance against other approaches.

A. DATASET DESCRIPTION
Experimental in this study uses benchmark dataset called ArCOV [4]. This dataset contains 3612 distinct tweets crawled from Twitter covering the period from 27 January to 30 April 2020. The dataset included several types of rumors related to COVID-19. Each subset of data has a balanced label distribution with 1480 rumors tweets (46.87% ) and 1677 non-rumors tweets (53.12% ) as shown in Figure 5.
Certain preprocessing steps were conducted on the dataset. Noise is anything that decreases the effectiveness of the algorithm and prevents one from getting insights from the text. Stop-words, white-spaces, hashtags, and URLs are considered as noisy text data. WordCloud modules have been utilized to visualize the majority of words that have been portrayed in each sentiment (Rumors and non-Rumors) (see Figure 6).

B. EVALUATION METHODS
To compare the efficiency and efficacy of various classification systems, a variety of measures can be used. The suggested model is evaluated using the following assessment metrics: accuracy (Acc), sensitivity (Sen), specificity (Sep), and F1 score. Using those measures, predictability can be computed by true positives (TP), true negatives (TN), false positives (FP), and false negatives (FN). In our experiment, all methods are evaluated using 5 cross validations as follows: Specif icity = T P T P + F P (15) Accuracy = T P + T N T P + F P + T N + F N (17)

C. METHODS FOR COMPARISON
Our approach can mainly be compared with four typical models of rumor detection.
• AraBERT: A pre-trained BERT model that has been used recently for rumor detection from using a Arabic contents [55]. • MARBERT: is another pre-trained BERT model that has been used to analyze rumors from Twitter data. [56]. • PCNN: is another model that was commonly explored in previous rumors detection takes [24]. • LSTM-PCNN: This is a hybrid deep learning technique which analyzes texts from social media in order to detect rumors [26]. • LSTM: previous studies have utilized LSTM to preform rumor detection from social media and achieved good results [25]. • Attention-LSTM: This method utilizes attention mechanisms to find deep representations of data for efficient rumor identification [32,57]. • The Proposed Model (HANN): A hybrid model that utilizes attention mechanisms to identify rumours from social media. Table 1 illustrate the description of HANN.

D. RESULTS
To perform rumor detection, we employ attention techniques that extract key features from textual data. Using attention can help identify the importance of different words in our model as well as it provides an explanation of why the model reached a decision. To visualize the importance of different words in a sentence, we take the output of the attention layer and display the weights of the words with different colors. The color indicates how a specific word contributes to the final classification decision. Figure 7 shows the visualized computed weights scores of different words. The darker colors indicate the most important words. As seen from the visualization, the words that are highlighted in a sentence are specific to their correct class. Extracting relative textual features can give some insight into rumor's classification decision. Hence, our model can reveal keywords that may be interpreted as rumor or nonrumor.    Table 2 are listed in the order of lowest to highest accuracy values, with the highest values presenting last. It can be observed that the proposed method HANN achieves the greatest performance of sensitivity (Sen), specificity (Sep), and F1 score compared with the other approaches. Additionally, when the suggested model is compared to other approaches, the proposed model achieved the best results in the terms of accuracy, micro average and weighted average as shown in Table 3. The suggested HANN model achieved 91.5%. Figure 8 shows the confusion matrix of HANN model. In addition, the AUC score shown in Figure 9 proves that HANN achieves good result which indicates the proposed model and the attention mechanisms are superior at extracting features, and this is why we use them in the our model to improve the accuracy of rumour detection.

E. DISCUSSION
Researchers have proposed several studies on rumor detection. However, previous studies focused only on classifying or detecting rumors or non-rumors, but couldn't explain how they detect rumours. In addition, none of the studies have taken other factors like the number of followers, friends, or even age into account when detecting rumors in social media. In this paper we suggest a hybrid attention neural network to identify and explain rumors detection from social media. As part of the attention mechanism, specific features that play a significant role in rumor detection are highlighted in order to increase the accuracy of predictions. Further improvements are made to the model by leveraging software engineering features. This sheds the light on the need to carefully collect users' explicit and collective feedback that can be used to optimize HANN model in a manner that better suits users' preferences. By failing to do so, users' feedback on others' posts (or HANN's classified posts) could suffer, potentially leading to a failure of the whole usage of users' feedback to improve HANN's accuracy. To reach this, we adopt the application-independent conceptual framework proposed [54] to gamify users' feedback collection process on the accuracy of HANN's classification and the detection of rumors that the model could fail to classify as rumors or facts. In terms of performance and accuracy, the new approach (HANN model) outperformed many conventional approaches (LSTM, PCNN, LSTM-PCNN, and Attention Bi-LSTM). The suggested HANN model achieved (0.915%) accuracy as it's shown in Table 3. In summary, our proposed model's results provide two main findings.. First, our model can successfully identify rumors from social media. Second, HANN can effectively highlight the importance of different features and provide explanation for classification results.

VI. CONCLUSION
Rumor detection is a crucial problem with far-reaching repercussions for the economy, democracy, and public health and safety. Applying traditional classification and deep learning algorithms to rumor identification cannot explain  why and how texts are classified as rumor or non-rumor. This paper introduces a hybrid attention neural network (HANN) to identify rumors from social media. The advantage of HANN is that it provided an explanation of the model's decisions in addition to capturing the relative and important features between different classes. The proposed model includes two deep neural networks: CNNs and Bidirectional Long Short Term Memory (Bi-LSTM) networks with attention modules. According to experimental results, the new approach (HANN model) performed better than many contemporary models in terms of performance and accuracy (0.915%). The accuracy of the model was further enhanced by software engineering features such as followers, friends, and registration age. Future work could focus on predicting personality and society rumors with semantic structures. Furthermore, a software engineering method that guides software developers in implementing the adopted feedback acquisition framework will be proposed and tested on the HANN model. TALAL H. NOOR is an associate professor and vice dean of the Applied College -Badr branch at Taibah University, Saudi Arabia. His research interests include Services Computing, Security and Privacy, Trust Management, Social Computing, and Human Computer Interaction. Noor received a PhD in computer science from the University of Adelaide, Australia. He is a member of IEEE.