MVAN: Multi-View Attention Networks for Fake News Detection on Social Media

Fake news on social media is a widespread and serious problem in today’s society. Existing fake news detection methods focus on finding clues from Long text content, such as original news articles and user comments. This paper solves the problem of fake news detection in more realistic scenarios. Only source shot-text tweet and its retweet users are provided without user comments. We develop a novel neural network based model, Multi-View Attention Networks (MVAN) to detect fake news and provide explanations on social media. The MVAN model includes text semantic attention and propagation structure attention, which ensures that our model can capture information and clues both of source tweet content and propagation structure. In addition, the two attention mechanisms in the model can find key clue words in fake news texts and suspicious users in the propagation structure. We conduct experiments on two real-world datasets, and the results demonstrate that MVAN can significantly outperform state-of-the-art methods by 2.5% in accuracy on average, and produce a reasonable explanation.


I. INTRODUCTION
With the rapid development of social media platforms, such as Twitter, fake news can spread rapidly on the internet and affect people's lives and judgment. On April 27, 2020, the president of the U.S.A Donald Trump said, ''Fake news, the enemy of the people!'' Those words indicate that fake news has been a serious social problem. Fake news refers to false statements and rumors on social media, including completely false information or gross misrepresentation of a real event. However, due to the limitations of expertise, time or space, it is difficult for ordinary people to separate fake news from the vast amount of information available online. Therefore, it is necessary to develop automated and auxiliary methods to detect fake news at an early stage. With the development of artificial intelligence (AI), many researchers have attempted to apply AI technology to automatically detect fake news [2].
Early research on automatic detection of fake news mainly focused on designing effective features from various information sources, including text content [2]- [4], publisher's personal information [2], [5] and communication The associate editor coordinating the review of this manuscript and approving it for publication was Alba Amato . mode [6]- [8]. However, these feature-based methods are very time-consuming and labour-intensive. In addition, the performance of the model is very dependent on the quality of the artificial features, thus, the performance of this method is not ideal in most cases.
Driven by the success of deep neural networks, several recent studies [9], [10] have applied various neural network models to fake news detection. For example, a recurrent neural network (RNN) [11] is used to learn the representation of the tweet text on the posting timeline. Liu [9] modelled the propagation path as a multivariate time series and applied a combination of RNN and convolution neural network (CNN) to capture the changes of user characteristics along the propagation path. The main limitation of these methods is that they can only process sequence data but cannot process structured data, leading to the inability to properly simulate the real propagation structure.
We know that the dissemination structure of the news on social media can constitute a social network graph. Generally, tweets can be reposted by any other user. The structure of tweet propagation composed of retweet users is shown in Fig. 1. With the help of social media, a piece of Twitter news can be spread all over the world in a very short time.
To capture the information hidden in the sequence text and VOLUME 9, 2021 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ the structured propagation graph at the same time, RNN and graph neural network (GNN) were used to process these two kinds of data. Meanwhile, to make the model have better learning ability and certain interpretability, two different attention mechanisms, text semantic attention and propagation structure attention, were added to RNN and GNN.
The main contributions of this paper are summarized as follows: (1) To the best of our knowledge, we are the first to adopt graph attention networks (GATs) to encode and represent the propagation structure of news.
(1) Experimental results on two real-world datasets show that the multi-view attention networks (MVAN) model achieves the highest accuracy and outperforms state-of-theart models.
(3) Our model is more robust in early fake news detection and the model has some interpretability in both perspectives of text and propagation structure.

II. RELATED WORK
The goal of fake news detection is to distinguish the authenticity of news published on social media platforms based on their relevant information (such as text content, comments, propagation structure, etc.). Related works can be divided into different categories as follows.

A. FEATURE-BASED METHODS
Some early studies focused on fake news detection based on handcrafted features. These features are mainly extracted from text content and users' profile information. Castillo et al. (2011) [2] proposed a decision tree-based model, utilizing a large number of features for fake news detection on Twitter. Yang et al. (2012) [5] created two new features to enrich the feature set of previous researchers: client-based features and location-based features. They are used to automatically detect fake news on Sina Weibo. Wu et al. (2015) [12] used a propagation structure composed of 23 features in the hybrid support vector machines (SVM). These features are divided into three categories (messagebased features, user-based features and retransmission-based features). Wu et al. (2017) [13] proposed a machine learning model based on time series fitting of tweet volume time characteristics. Ma et al. (2015) [11] proposed an SVM model that engineers each of the social context features. Rath et al. (2017) [14] extracted user information and combined them with an RNN model.

B. CONTENT-BASED METHODS
Content-based methods rely on the text content to detect the truthfulness of the news article. Ma et al. (2015) [11] and Yu et al. (2016) [15] combined text and RNN or CNN for fake news detection. Chen et al. (2018) [16] combined the attention mechanism with the text to detect fake news at an early age.   [9] used both RNN and CNN to encode the propagation structure for early fake news detection. Ajao et al. (2018) [17] proposed a hybrid CNN-long-  [18] proposed an attention-based convolutional approach for text authenticity detection. Shu et al. (2019) [19] proposed a sentence-comment co-attention sub-network to use both news contents and user comments for fake news detection and used an attention mechanism to provide explainability.

C. STRUCTURE-BASED METHODS
Unstructured methods cannot handle structured data. In recent years, researchers have proposed some new approaches to use structural information.   [21] constructed a recursive neural network to handle conversational structure. This model generates a tree structure by bottom-up or top-down propagation. Monti et al. (2019) [22] proposed propagation-based fake news detection using graph convolutional networks (GCN). Nguyen (2019) [23] detected fake news using a multimodal social graph. Li et al. (2020) [20] combined objective information and subjective factors for rumor detection. Bian et al. (2020) [24] proposed a novel bi-directional graph neural networks model, bi-directional graph convolutional networks (Bi-GCN), to explore both characteristics by operating on both top-down and bottom-up propagation of fake tweets. Lu and Li (2020) [25] developed graph-aware co-attention networks (GCAN) to detect fake news, which generated an explanation by highlighting the evidence on suspicious retweeters and the words of concern. Li et al. (2020) [26] built a conversation structure from source tweet and user comments, and used GNN to encode it. Li et al. (2020) [27] crawled user-follower information and built a friendly network based on the follow-followers relationship.
We compare our work and the most relevant studies in Table 1. The uniqueness of our work lies in: targeting at source news text, requiring no user response comments, analysing model explainability and Integrating multiple attention mechanisms.

III. THE BACKGROUND OF THE RELATED DEEP LEARNING TECHNIQUES
Deep learning is a significant branch of machine learning. The mainstream of deep learning is based on neural network methods, and there are also some methods based on tree models [28], [29]. The deep learning in this paper refers specifically to deep neural networks. It has achieved unprecedented success in multiple natural language tasks, such as machine translation, emotion analysis, question answering system, etc. Deep learning is essentially a way of expressing learning that is different from traditional methods, which extracts features based on the manual method. Deep learning models can automatically generate appropriate vectors to represent words, phrases and sentences. This chapter focuses on relevant deep learning models used in this article. In this section, we introduce several deep learning techniques used in our model.

A. RECURRENT NEURAL NETWORKS
RNN is one of the most commonly used deep learning networks in natural language processing (NLP) tasks of AI. RNN is a type of feed-forward neural network that can be used to model variable-length sequential information such as sentences or time series. Therefore, it has some advantages in learning the nonlinear characteristics of sequences. For each time step, RNN updates its hidden state ht by extracting information from its last time step in a hidden state h t−1 and the input in this time step x t . This process continues until all time steps have been evaluated. The algorithm iterates over the following equations: where f and g are activation functions, W hh and W xh are parameter matrices and h means the hidden state an RNN.

B. GATED RECURRENT UNIT
In practice, because of the vanishing or exploding gradients, the basic RNN [30] cannot learn long-distance temporal dependencies with gradient-based optimization (Bengio et al., 1994) [31]. One way to deal with this is to make an extension that includes ''memory'' units to store information over long periods, commonly known as LSTM unit (Hochreiter and Schmidhuber, 1997) [32], (Graves, 2013) [33] and gated recurrent unit (GRU). Here, we briefly introduce the GRU. Unlike an RNN unit, GRU has gating units that modulate the flow of the content inside the unit. The following equations are used for a GRU layer: σ is a logistic sigmoid function. z t and r t are update gate and reset gate of GRU, respectively. h t denotes the candidate activation of the hidden state h t .

C. WORD2VEC
Natural language is a special composition of characters produced by human beings for communication. For a computer to understand the natural language, words in the natural language need to be encoded. In the early days of NLP, words were often converted into discrete individual symbols according to the order in which the word appeared in a corpus. This encoding method was called one-hotencoding. However, such an encoding method does not reflect the relationship between words. To overcome this problem, Bengio et al. (2003) [34] proposed the concept of word embedding for the first time in 2003. They assumed that each word in the glossary corresponds to a continuous eigenvector. This idea has since been widely applied to various NLP models, including word2vec. In word2vec, the two most important models are the CBOW model (continuous bag-wordsmodel) and skip-gram model. The basic idea of CBOW and skip-gram is to make the vector represent the information contained in the word as fully as possible, while keeping the dimensions of the vector within a manageable range (between 25 and 1,000 dimensions, if appropriate). The CBOW model learns the expression of word vectors from the prediction of context to the target word. Mathematically, the CBOW model is equivalent to the embedding matrix of a word bag model multiplied by a successive embedding matrix. Conversely, the skip-gram model learns word vectors from the target word's prediction of context.

D. ATTENTION MECHANISM
The attention mechanism, as the name suggests, is a technique that enables models to focus on important information. It is not a complete model but a technique that can be used in deep learning models. The mechanism was first proposed in the field of visual images. Mnih [36] applied an attention mechanism to NLP that combined translation and alignment in machine translation tasks. In NLP, different words in a sentence have different importance, such as I hate this movie.
In an emotional analysis, it is obvious that the word ''hate'' plays a more important role than the other words, which means that the model should concentrate on that word. Further, the attention mechanism is widely used in various NLP tasks based on neural network models, such as RNN/CNN. In 2017, the Google machine translation team made extensive use of the self-attention mechanism to learn textual representation [37]. The mechanism of self-attention has also become a recent research hotspot and has been explored in various NLP tasks.

E. GRAPH ATTENTION NETWORKS
Graph attention network (GAT) [39] is a kind of GNN. GNNs are especially deep learning-based methods that operate on non-Euclidean structures data domain. Due to their convincing performance and high interpretability, GNN is the widely applied graph analysis method recently [38].
There are several variants of GNNs, of which GAT is the most commonly used. GATs are proposed by Velickovic et al. (2018) [39]. This kind of neural network incorporates the attention mechanism into the graph propagation step. It computes the hidden states of each node by attending over its neighbours, following a self-attention strategy. Velickovic et al. (2018) [39] proposed a single graph attentional layer and constructed arbitrary graph attention networks by stacking this layer. The layer computes the coefficients in the attention mechanism of the node pair (i, j) by: where a ij is the attention coefficient of node j to i, N i represents the neighbourhoods of the ith node in the graph. W is the weight matrix of a shared linear transformation, which is applied to every node. a is the weight vector of a single-layer feedforward neural network.

IV. PROBLEM STATEMENT
Let = {s 1 , s 2 , . . . , s | | } be the set of source tweets (shorttext) and = {g 1 , g 2 , . . . , g | | } be the set of propagation structure graph. Each source tweet s i ∈ corresponds to a propagation structure graph g i ∈ . When a Twitter post s i is published, other users will retweet it. The propagation structure graph g i ∈ of each source tweet is composed of its retweet users u j . g i = {u 1 , u 2 , . . . , u j } and we denote F = {f 1 , f 2 , . . . , f |F| } as the set of user features. Given a source tweet s i , along with the corresponding propagation g i containing users u j who retweet s i , as well as their feature vectors F j , the goal of our model is to classify s i as 'true' or 'fake'. The classifier performs learning through labelled information, i.e., C i : s i → y i . In addition, we require our model to highlight some users u j ∈ u i who retweet s i and some words q i ∈ s i that can interpret why s i is predicted as a true or fake one.

V. THE PROPOSED MVAN MODEL
We propose a novel neural network model, multi-view attention networks (MVAN), to detect fake news based on the source tweet and its propagation structure. As can be seen from Fig. 2, the proposed model MVAN consists of three components. The first is text semantic attention networks. Its role is to obtain the semantic representation of the source tweet text information. The second is propagation structure attention networks. It captures the hidden information in the propagation structure of a tweet. The last is the prediction module. It generates the final detection result by concatenating text semantic representation and propagation structure representation.

A. TEXT SEMANTIC ATTENTION NETWORKS
In this work, to correctly represent the information contained in the source tweet text and capture the key clue words in the source tweet text, we propose text semantic attention networks to process the source tweet text. We define x i ∈ R d as the d-dimensional word embedding corresponding to the ith word in the source tweet. Because the length of each source tweet is different, we perform zero-padding here by setting a maximum length L. Let E = [e 1 , e 2 , . . . , e l ] ∈ R l be the input vector of the source tweet, in which e l is the embedding vector of the word, when the ith word is the pad token, its embedding vector e l is an vector filled with 0. We use word2vec to encode the words of the source tweet. Moreover, a deep bi-directional gated recurrent unit (BiGRU) is used to capture the relationships among words and generate the source tweet representation: where h t is the final hidden state of BiGRU, h t ∈ R d l . e t is the word embedding vector of the source tweet.
Since each word has a different role in detecting fake news, we believe that the model could focus on the keywords and reduce the role of other irrelevant words. We use a layer of the fully connected network to map the output vector h t of BiGRU to a matrix-vector u t .
where W w ∈ R n×l and b w are the weight and bias of attention and tanh is the activation function mapping the value between [−1, 1]. u t ∈ R n l , and n is the number of neural units in the fully connected layer. Then we calculate the attention coefficient of each word, which is the final weight of each word.
where u w is the weight matrix, · T represents transposition and a t ∈ R l . Finally, each word vector h t and attention coefficient a t are weighted and summed to obtain the representation of the source tweet: where o is dimensional of output layer. Through text semantic attention networks, we get a representation vector containing text semantics and the attention weight of each word.

B. PROPAGATION STRUCTURE ATTENTION NETWORKS
Research shows that fake news and real news have different propagation structures [8]. Therefore, we propose propagation structure attention networks to capture the clues implicit in the propagation structure of news. In this part, we discuss how to encode the propagation structure into the node representation for fake news detection. Inspired by GAT [39], we apply the attention mechanism to learn a distributed representation of each user node (retweet user) in the propagation structure graph by attending over its neighbours (set of user who retweeted that user's post). The input to the propagation structure attention layer is a set of user features, u = { u 1 , u 2 , . . . , u N }, u i ∈ R F , where N is the number of user nodes and F is the number of features in each user node. The output of the propagation structure attention layer is a new set of user node features, p = { p 1 , p 2 , . . . , p N }, p i ∈ R F . Note that F may not be equal to F.
To obtain enough presentation power to transform the original user features into higher-level features, at least one learnable linear transformation is required. For this purpose, as an initial step, a shared linear transformation, parametrized by a weight matrix, W ∈ R F ×F , is applied to every user node. Then self-attention mechanism att is used on each user node: R F × R F → R computes attention coefficients that indicate the importance of user node j's features to user node i.
We inject the propagation structure into the mechanism by performing masked attention, we only compute c ij for nodes j ∈ U i , where U i is some neighbourhood of user node i in the propagation structure graph. In our experiment, each user node calculates the attention coefficients of all its firstorder neighbours. To prevent the information of user node i from being forgotten, we regard user node i as its first-order VOLUME 9, 2021 neighbour. To make the attention coefficients easy to compare between different nodes, we use the softmax function to normalize them among all the choices of j: In this paper, the propagation structure attention mechanism is a fully connected layer, parametrized by a weight vector a ∈ R 2F , using the LeakyReLU function nonlinearity. Fully expanded out, the coefficient calculated by the attention mechanism can be expressed as: where means transposition and || is the concatenation operation.
where || represents concatenation, a h ij are normalized attention coefficients calculated by the h-th attention mechanism and W h is the corresponding input liner transformation's weight matrix. ELU is the activation function.
Finally, in the output layer of propagation structure attention networks, we replace the previous concatenation with the average and use ReLU instead of ELU as the activation function. The process can be expressed as: Through propagation structure attention networks, we get a representation vector of the propagation structure of news and the attention weight of each user node and its neighbour nodes.

C. PREDICTION MODULE
The prediction module is a multi-layer feedforward neural network module. Based on the output of the text semantic attention networks and propagation structure attention networks, we use a softmax function in the output layer to predict the label of the Twitter news: where || represents concatenation, W tp and b tp are parameters in the output layer and V t and V p are the output vectors of text semantic attention networks and propagation structure attention networks, respectively. For each training process, we use the cross-entropy loss function to minimize the deviation between the predicted label and the real label: where is the model parameter to be estimated, y is the real label andŷ t andŷ f are the two labels predicted by the model: true and fake.

A. DATASETS
Two well-known fake news datasets, Twitter15 and Twitter16, 1 were used to evaluate our MVAN model. The statistics of the databsets are shown in Table 2. Each dataset contains a collection of source tweets, 2 along with their corresponding sequences of retweet user IDs. We chose only ''true'' and ''fake'' labels as the ground truth. And we balance the three words ''obama'', ''Paul'', and ''Sydney'' that were strongly unbalanced in the datasets. Because the original dataset did not include user profiles, we used user IDs to crawl user feature information using Twitter API. 3 Some users have been deleted or abolished during the crawling process. In the Twitter15 and Twitter16 datasets, the total number of retweets is 190,868 and 115,036 respectively. The number of user information we collected through API is 177,049 and 108,801 respectively, and the corresponding percentages are 92.76% and 94.58%. It can be found that the missing users account for only a small part of the total. For missing user information, we use the mean value of other user features in the same propagation structure to fill it. Through the API, we crawled a total of 38 user features. Based on these previous works [2], [9], [44], 15 common user features that are available on Twitter, which is summarized in Table 3.

B. EXPERIMENTAL SETUP 1) MODEL COMPARISON
We compare our proposed models with other models, including some of the current state-of-the-art models.
• TextCNN [42]: a convolutional neural network model for obtaining the representation of each tweet and their classifications using a softmax layer [43].
• CSI [43]: a state-of-the-art fake news detection model that captures the temporal pattern of user activities and scores users based on their behaviour [44].
• CRNN [9]: A Hybrid Deep Model for Fake News Detection]: A model combining RNN and CNN, which learns the text representations through the characteristics of users in the Twitter propagation path [17].
• dEFEND [19]: a state-of-the-art co-attention-based fake news detection model that learns the correlation between the source article's sentences and user profiles [20].
• G-SAGE [26]: a state-of-the-art detecting fake news by modeling conversation structure as a graph using GraphSAGE and BiLSTM [27].
• MVAN: Our new deep neural network model, which uses both text semantic attention and propagation structure graph attention to detect fake news.

2) PARAMETER SETTING
In the text processing stage, we first cleaned the text information by removing useless expressions and symbols, uniform case, etc. We used GoogleNews pre-trained word2vec data with 300 dimensions for word embedding and set the maximum vocabulary to 250,000. The hidden size of BiGRU is 300, and the number of layers is 2. The batch size of the model was 64. The 15 user features shown in Table 3 were used as input data in the training process of the propagation structure attention network. We used 5-head attention in the graph attention network, and the number of graph attention network layers is 2. Moreover, we used the LeakyReLU nonlinearity with a negative input slope a = 0.3. In the training phase, we used Adam with a 0.001 learning rate to optimize the model, with the dropout rate set to 0.5.

3) EVALUATION METRICS
For a fair comparison, we adopted the same evaluation metrics used in previous work. Therefore, the Accuracy, Precision Recall, and F1-measure (F1) were adopted for evaluation, which as described in the following equations: where TP are the true positive, TN are the true negative, FP the false positive and FN the false negative predictions. We followed GCAN [25] to split the datasets. In this paper, 70% of the data were randomly selected for training and the remaining 30% is used for testing. The results of the experiment are an average of ten times.

C. EXPERIMENTAL RESULTS
The main experimental results are shown in Table 4. The proposed MAVN model is significantly better than the stateof-the-art model in all the evaluation criteria in the two public datasets. Compared with the state-of-the-art model G-SEGA, the accuracy of Twitter15 and Twitter16 is improved by about 3.06% and 2.03%, respectively. Compared with the machine learning model SVM-BOW, the accuracy of our proposed model on both Twitter15 and Twitter16 datasets is improved by about 25%. As shown in Table 5, we performed statistical tests on all models, and reported the average accuracy ± standard deviation with confidence levels of 0.90, 0.95 and 0.98 respectively. The results in Table 5 show that our model has a significant performance improvement on the two public datasets. MVAN can better represent text and propagation structural information, thereby improving the accuracy of fake news detection.

D. ABLATION STUDY
To determine the relative importance of each module of the MVAN, we conducted a series of ablation studies on key parts of the model. The brief introduction of each model used for ablation research is as follows: • MVAN-TSA: Both text and propagation structure information are used, but the text semantic attention mechanism is removed from the original MVAN model, and only BiGRU is used to encode the text.
• MVAN-PSA: Both text and propagation structure information are used but the MVAN model removes propagation structure attention mechanism and Node2vec is used to encode the propagation graph structure directly.
• TSAN: Text semantic attention network only uses text data to classify news.
• PSAN: The propagation structure attention network only uses propagation structure data to classify news. The experimental comparison results are shown in Fig. 3. We found that when the MVAN model removed the text   semantic attention mechanism or propagation structure attention mechanism, the performance dropped by about 1%. This shows that these two attention mechanisms have a certain effect on our model performance. When the model used TSAN only, the performance of the model dropped by 2.9% to 3.6%, because the model loses very important propagation structure information. In addition, if only PSAN was used, the performance of the model dropped by about 9% on both data sets, because the model does not even read the text content of the news itself. However, the performance of PSAN on Twitter15 and Twitter16 data sets reached 8.31% and 8.45%, respectively, which proves that numerous clues can be used to detect fake news in the propagation structure.

E. EARLY DETECTION PERFORMANCE
It is very important to detect fake news in the early stages of propagation so that preventive measures can be taken as quickly as possible. In the early detection tasks, all user information after the detection deadline is invisible during the test. The earlier the deadline, the less propagation information is available.
While comparing the previous comparison models, we add ST-GCN [45] and DCRNN [46] models for comparison. These two models are neural network models for processing temporal sequence graphs. For the convenience of comparison, we replaced the propagation structure attention modules in our model with ST-GCN and CDRNN, which we named MVAN + ST and MVAN + DC, respectively. As can be seen in Fig. 4, our model can achieve an accuracy of approximately 91% in the earliest stage. DCRNN is a combination of GCN and RNN. It only supports the timing input of the isomorphic graph model. Therefore, we build a corresponding DCRNN model for each time window for testing, but because each graph corresponds to For each structure, the RNN module in the model training process is equivalent to only one output, so the performance of this model is not fully utilized. ST-GCN can be used to convolve information in both spatial and temporal dimensions at the same time. Therefore, the effect of ST-GCN is quite good, and there is a great possibility of improvement afterwards. Moreover, from the broken line diagram, the curve of MVAN is very stable, indicating that our model has good robustness and high performance in early fake news detection.

F. INTERPRETABILITY ANALYSIS
The text semantic attention assigns relative weight values to the words in the source tweet (the weight value is between 0 and 1). We analyzed the weight distribution of the words in the two datasets Twitter15 and Twitter16. Regardless of whether it is fake news or true news, most words had a small weight of about 0.1 (Fig. 5), implying that most words have little effect in distinguishing the truth of the news. Additionally, we found that the proportion of words with an attention weight equal to 1 in real news was much higher than that in fake news. For example, in Twitter15 and Twitter16, words with an attention weight of 1 accounted for 1.3% and 3.7%, respectively, in fake news and 10.9% and 23.5%, respectively, in true news.
To further analyze the interpretability of the model, we selected two examples from the real data set for analysis. As shown in Fig. 7, the left one is true news and the right one is fake news. We used the text semantic attention mechanism to highlight the evidence words. We thought that if the word ''confirmed'' was included in the news, then the news is more likely to be true. Contrarily, if there were more ''?'' in the news, which represents doubtful information, then the news may be fake.
In addition, we used the propagation structure attention mechanism to find the most weighted key clue retweet users in the propagation structure. We randomly select a fake and a real source tweet, and draw their weights according to the propagation structure attention, as shown in Fig. 7, where the FIGURE 7. Real case analysis of true news and fake news. The key clue words in the source tweet are highlighted by text semantic attention weights. Visualization of attention weights for retweet propagations of a fake and a true source tweets. From left to right is retweet order, and dark colors refer to higher attention weights. The user with the highest weight in the retweet propagation is marked, and the main characteristics of the user are displayed.
horizontal direction from left to right represents the order of retweets. As shown in the Fig. 7, in the retweet propagation structure of the real news, the first user has the highest weight, and the 24th user of the fake news has the highest weight. The results show that to determine whether a news is fake, one should first check the features of users who early retweet the source tweet. In the propagation structure of fake news, user weights are more evenly distributed. The features of key Twitter users in the propagation structure of true news and fake news were markedly different (Fig. 7). The user accounts in the true news were created earlier, with authentication icons, profiles and followed by many users and were followed by many users. Such users are generally more authoritative official news accounts. However, the user accounts in fake news are created late, with no certification, no profile and without many followers. General, such a user account perhaps spreads fake news.

VII. CONCLUSION
We propose a new deep learning model for fake news detection, MVAN, which combines two attention mechanisms, text semantic attention and propagation structure attention, to simultaneously capture the important hidden clues and information in the source tweet text and the propagation structure. The evaluation results using two public data sets show that MVAN has strong performance and reasonable interpretation ability. In addition, MVAN can provide early detection of fake news with satisfactory performance. We believe that MVAN can be used not only for fake news detection but also for other text classification tasks on social media, such as sentiment classification, topic classification, insult detection, etc.
In future work, the users' reply information will be added to further improve the performance of the model. Subsequently, GNN combined with the attention mechanism will be used to capture the key information hidden in the conversation structure graph composed of the source tweet and its replies. We will start from a real-world perspective and detect fake news from more different perspectives.