• Abstract

SECTION I

## INTRODUCTION

Nowadays, microblog systems such as Twitter and Sina Weibo become more and more popular since they facilitate fast dissemination and acquisition of information. Individuals can freely share information with microblogs, especially in emergency situations, such as earthquakes, floods, and hurricanes [1]. Microblog systems have grown quickly, and people tend to use them as an important source to share and obtain the information in everyday life [2]. On the other hand, the abuse of microblogs to spread rumors (or unreliable information) has also been widely reported [1] [2][4]. Rumors often refer to information whose truth and source are unreliable, and are likely to be generated under emergency situation, causing public panic, disrupting the social order, decreasing the government credibility, and even endangering national security. For example, on March 2011, few days after a powerful earthquake rocked Japan, triggered a tsunami, and later a nuclear crisis. A rumor that iodized salt was capable of protecting people from nuclear radiation had been widespread across China via Sina Weibo and other microblog platforms. This made people flocked to stores, supermarkets, and dispensaries to buy salt. As a result, the iodized salt price increased 5 to 10 times during that time. To limit the wide spreading of rumors, it is essential for microblog systems to detect rumors as soon as possible.

A popular approach to discriminate the rumor posts from normal ones is to consider the problem of rumor identification as a binary classification problem of machine learning. This is based on a hypothesis that the rumor posts are similar statistically [4], [5]. If the data fail to exhibit the statistical similarity, the learning will fail. Researches have demonstrated that no one single learning method is overwhelmingly superior in all scenarios, and different learning algorithms may yield similar results [6]. It is also reported that identifying the most salient features of data for machine learning usually has an enormous impact [7] in classification problems. Therefore, choosing a representative set of features is a crucial step for rumor identification. We observe that existing researching efforts on rumor identification [8] [9] [10] [11] [12] [13] [14] [15] [16][17] have been focused on using microblogs’ inherent features, such as content-based features, multimedia-based features, propagation-based features, and Topic-based features listed in Table I. In contrast, the features based on users’ behaviors are less explored, though it has been shown that there exists close correlation between users’ behaviors and the information credibility of microblogs [1], [16]. In [1], Mendoza et al. found that rumor microblogs tend to be questioned by more users than normal microblogs. In [16], Shirai et al. reported that 14.7% people or organization would publish rumor correction microblogs as soon as they found the rumor microblogs. Besides, mass behaviors on microblog posts such as number of posts and number of followers can be exploited to help determine whether a microblog post is a rumor.

Table I Commonly Used Features for Rumor Identification

In this paper, we investigate the problem of rumor identification in microblog systems. By observing that the rumor publishers’ behaviors may diverge from normal users’ and a rumor post may have different responses from a normal post, we propose a user behavior-based rumor identification schemes, in which the users’ behaviors are treated as hidden clues to indicate who are likely to be rumormongers or what posts are possible rumor microblogs. Our approach on rumor identification consists of three phrases. 1) Based on the collected microblogs and users profiles, we gather the features of users’ behaviors from each microblog post. In total, nine features of users’ behaviors are adopted in this paper, and five of them have not been studied before. 2) We apply five most popular machine-learning algorithms to train classifiers for rumor identification. 3) We use the trained classifiers from the second phrase to predict whether a microblog post is a rumor.

Experiments are conducted on real-world data from Sina Weibo, which is a gigantic Chinese microblog platform with over 500 million users, to demonstrate the efficacy and efficiency of our proposed method and features. The experiment results show that the precision, recall, and F-score of our approach reach 0.8645, 0.8535, and 0.8590 respectively, and these three metrics have achieved 13.14%, 18.13%, and 16.68% improvements on average, comparing to the baseline approach.

This paper is organized as follows. In Section II, we provide the related work of the rumor identification. In Section III, we define rumor identification problem. In Section IV, we propose our approach for rumor identification. In Section V, we present experimental result. Finally, we conclude this paper in Section VI.

SECTION II

## RELATED WORK

In this section, we briefly summarize the results of existing rumor identification research.

To identify the rumors spreading in microblog systems, several attempts have been made by microblog service providers. Sina Weibo maintains an official account @WeiboPiyao [20], operated by senior journalists $24 \times 7$. It publishes microblogs on the new rumors regularly, so that Weibo users who follow this account can be alerted. In addition, Sina Weibo adopts crowdsourcing technique [32] to provide a service named Weibo Misinformation Declaration [21]. Any Weibo users can report suspicious rumors through this service. A team of journalists will judge whether the reported microblogs are really rumors, and publish the results on Weibo. Both methods to some extent curbs the rumors spreading on Weibo. In these methods, however, the credibility of the information was completely assessed by Weibo journalists manually, which is costly and labor intensive. Moreover, there exists a large delay in rumor detection in these methods. From December 2010, when @Weipopiyao posted the first microblog on rumor, to January 2015, 480 rumor microblogs had been published totally. It is obvious that the @Weibopiyao can only catch a small portion of rumors on the Weibo. As for Weibo Misinformation Declaration, the average delay between the reported time of a suspicious rumor and its decision time is more than 24 h. It is crucial to design and develop a system which can automatically identify the information credibility of microblog systems.

Automatic rumor identification in microblog systems is a relatively new field. There have so far been only a few works to address this problem, and most of these works primarily focus on using microblogs’ inherent features. In [9], Castillo et al. extracted 68 features from posts of twitter and categorized them into four types: 1) content-based features, which consider characteristics of the tweet content, such as the length of a message and number of positive/negative sentiment words in a message; 2) user-based features, which consider traits of Twitter users, such as registration age, number of followers, and number of followees; 3) topic-based features, which are aggregates computed from message-based features and user-based features, such as the fraction of tweets that contain URLs, the fraction of tweets with hashtags, and the fraction of sentiment positive and negative in a set; and 4) propagation-based features, which consider features related to the propagation tree of a post, such as the depth of the retweet tree, or the number of initial tweets of a topic. After the research of Castillo et al., research efforts have been focused on exploiting new features for rumor detection. Qazvinian et al. [10] extracted attributes related to contents of tweets, features about the network, and specific memes of Twitter to build different Bayes classifiers to detect the rumors spreading on the twitter. Yang et al. [11] proposed two new features: 1) client-based feature and 2) location-based feature and trained a support vector machine classifier to identify the misinformation and disinformation of Sina Weibo. In [12], Sun et al. first proposed multimedia-based features for event rumors identification. Cai et al. [13] proposed text features from retweets and comments to construct rumor classifier. Wang et al. [14] proposed graph-based features and applied them in spam bots detection. Zhang et al. [33] mined the deep information of microblog contents and extracted implicit features, such as popularity, sentiment or viewpoint of message contents, and user historical information, to detect rumors in microblogs. In [34], Wu et al. studied message propagation patterns of Sina Weibo and used them as high-order features to construct a graph-kernel-based SVM classifier for rumor identification. Table I lists commonly used features for rumor identification of existing research.

As shown in Table I, we observe that the existing research on mass user behavior in rumor identification has not been explored adequately. Unlike previous studies, in this paper, we treat the features of user’s behaviors as fairly important clues to indicate who are likely to be rumormongers or what posts are possible rumor microblogs. We propose several new user behavior-based features to predict whether a microblog post is a rumor.

SECTION III

## BACKGROUND

In this section, we introduce the background and give a general model of rumor identification. We define users’ behaviors of microblogs as a set of vectors, in which every vector ${{\bm {m}}^{\left(i \right)}} = \langle b_1^{\left(i \right)}, b_2^{\left(i \right)}, \ldots,b_n^{\left(i \right)}, {c^{\left(i \right)}}\rangle$ contains the user behavior features of microblog $i$, where $n$ is the number of features of users’ behavior, $b_{j}^{(i)}$ represent the $j{\text{th}}$ feature of user’s behaviors of microblog $i$, and ${c^{\left(i \right)}}$ is the type (rumor or normal) of microblog $i$. Given a set of users’ behaviors of microblogs with known type, the problem of this paper is to find a method to predict the type of microblogs whose types are unknown based on their users’ behavior vectors.

A microblog system is a network made up of users and their relationships. Therefore, we can represent a microblog system by a directed graph $(N,\; R)$, which consists of a set of users $N = \left\{ {1,2,3, \ldots,n} \right\}$ and an $n\times n$ matrix $R = [{r_{ij}}]{i,\; j\; \in\; N}$, where ${r_{ij}} \in \left\{ {0,1} \right\}$ represents whether user $i$ follows user $j$. If ${r_{ij}} = 1$, it means that user $i$ is a follower of user $j$ and user $j$ is a followee of user $i$.

There are three ways to share or deliver information among users in microblog systems. 1) A user publishes a microblog post; 2) a user publishes someone else’s post (repost); 3) a user reposts someone’s post adding his or her comments (commenting). The reposting and commenting help microblog users quickly and widely share posts with all of their followers.

Fig. 1 illustrates the relations among four users, where a hollow circle represents a user, a solid circle denotes a post, a solid line with an arrow shows the following relation between two users, and a dashed line with an arrow indicates the direction of post transmission. From Fig. 1, we can see that user A is a follower of user B and both user C and user D are followers of user A, while user B is a followee of user A and user A is a followee of both user C and user D. In Fig. 1, user B publishes a post b2. User A, which is user B’s follower, receives and reads post b2, and then publishes post b2 as post a1 through reposting or commenting. Although user C and user D cannot read post b2 from user B directly, they are able to read this post from user A through post a1.

Fig. 1. Illustration of the relation among microblog users.
SECTION IV

## FEATURES FOR RUMOR DETECTION

As shown in [7], feature design and selection play a key role in rumor detection. The detection performance is heavily dependent on which features are adopted. By analyzing the characters of microblog users’ behaviors, we extract nine user behavior features from microblog posts. Users’ behaviors concerned in this paper include behaviors of author and readers of a microblog. There exist big differences in the use patterns between normal authors and rumormongers. For example, a very few rumormongers will use authenticated accounts to publish rumor microblogs in order to escape the possible corresponding responsibilities, while many normal users will use authenticated accounts to improve their reputation. It is obvious that readers will respond differently when reading rumor microblogs and normal microblogs. For example, rumor microblogs tend to be questioned more than normal microblogs. In this section, we describe the detailed description of features based on user behaviors to represent a microblog.

### A. Behavior Features Based on Microblog’s Authors

Behavior features based on microblog’s authors refer to the features extracted from behaviors of authors who publish microblog posts including verified user or not, number of followers, average number of followees per day, average number of posts per day, and number of possible microblog sources. Among these features, verified user or not and number of followers have been studied in the previous works [8] [9] [10] [11] [12] [13] [14] [15] [16][17] while we proposed three new features: 1) average number of followees per day; 2) average number post per day; and 3) number of possible microblog sources.

#### 1) Verified User or Not

This feature is used to indicate whether a user is verified by microblog service providers. In order to improve reputation and social influence, some microblog users apply to microblog service providers for user identity authentication. If user’s identity is verified by service providers, a verified tag will be shown after the user name and other users can judge whether a user is authenticated user by this tag. Strictly speaking, the feature of verified user or not generally belongs to features based on user’s profile. However, this feature can also be described as one behavior of microblog authors, for few rumormongers will post their rumor microblogs using the authentic account in order to escape the possible corresponding responsibilities. Therefore, we choose the feature of verified user or not to describe a choice behavior of microblog’s author.

#### 2) Number of Followers

A follower means a person who follows or subscripts to an account. The posted microblogs published by this account will appear on followers’ home timeline which display a stream of posts from accounts they have chosen to follow in microblog systems. The more followers a user has, the more people will receive his or her posts. Therefore, for rumormongers, to spread their rumors widely and rapidly, they usually publish rumor microblogs after the number of followers reach to a high value.

#### 3) Average Number of Followees Per Day

A followee is someone whose microblogs was subscribed and followed by other people. Unlike other social networks, such as Facebook and Wechat, a microblog user can follow any other users of microblog without their permission. Generally speaking, the more people a person follows, the more followers he will get. In order to attract more followers rapidly, rumormongers follow many people in a very short time. Therefore, this feature value of rumormongers usually is higher than that of normal users. The value of average number of followees per day can be calculated by using number of followees divided by user register days, and number of followees and user register days can be extracted from users’ profile directly.

#### 4) Average Number of Posts Per Day

Average number of posts per day refers to how many microblogs a user posted per day on average. Unlike normal users who share the information with their friends, the purpose of rumormongers using microblog is just to spread their fiction information. In order to escape the possible responsibilities, rumormongers will rarely or never log in the same account any more once they post rumor microblogs. Thus, the value for average number of post per day of rumormongers is probably far less than that of normal users.

#### 5) Number of Possible Microblog Sources

In this paper, number of possible microblog sources refers to the number of persons who post a specific microblog or its similar microblogs instead of forwarding it. Rumor microblogs usually are initiated from one people or a small number of people, while the authentic microblogs can be witnessed and originated by a large number of unrelated individuals. Therefore, there will be only one information source of the rumor in the network if rumor microblog is initiated from one person and the possible information sources are no more than the size of the group if a rumor is initiated by a small colluding group of people. Conversely, if the content of a microblog is authentic, there are probably many information sources of the information [18]. The value of this feature can be gotten from a microblog post after applying the following four steps.

1. Step 1)For a given microblog post, represent it as a set of keywords by using Tf-Idf method [19].
2. Step 2)Construct the searching keywords using the keywords generated in step 1) and collect the same or similar original microblogs (the contents of microblogs which are forwarded usually contain keywords such as “Re” and we can get rid of these forwarded microblogs by looking up whether there exist forwarded keywords in their contents) using the searching function provided by microblog service.
3. Step 3)Compute the similarity between the given microblog and every collected similar microblog-based on Jaccard coefficient as shown in (1) and get rid of the irrelevant microblogs whose similarity values are below the threshold from the collected similar microblog set. In this paper, if the value of similarity is bigger than 0.75, we consider that two microblogs are similar TeX Source$${\text{sim}}({t_i},{t_j}) = \frac{{\left\vert {{t_i}.{\text{keywords}} \cap {t_j}.{\text{keywords}}} \right\vert}}{{\left\vert {{t_i}.{\text{keywords}} \cap {t_j}.{\text{keywords}}} \right\vert}}$$ where $\left\vert\cdot\right\vert$ is the element number of a set, and ${t_i}.{\text{ keywords}}$ is a set of keywords which are extracted from the text of microblog $i$.
1. Step 4)Count the element number of the searched similar microblog set and assign it to the value of the number of possible microblog sources of the given microblog.

### B. Behavior Features Based on Microblog’s Readers

In this paper, features based on microblog’s readers mainly refer to the features extracted from behaviors of users after they read microblogs including number of reposts, number of comments, ratio of questioned comments, and number of corrections. Number of reposts and number of comments have been studied in previous investigations [8] [9] [10] [11] [12] [13] [14] [15] [16][17] as ratio of questioned comments and number of corrections are proposed in this paper.

#### 1) Numbers of Reposts and Comments

Almost all microblog services allow their users to repost and comment the posts they have read. Both reposts and comments can be seen as a response behavior which can reflect a kind of judgment to microblogs. Number of reposts indicates how many people repost a microblog and number of comments describes how many people express their opinions and attitudes to a microblog post. These two features are usually used to evaluate the popularity of a post. The larger the values of these features are, the more the post is popular in microblog systems. As for rumor posts, although their truth and sources are unreliable, yet their contents usually are hot topics of microblog for they describe the event of interest to others, not only to the friends of the author of each message [1]. Therefore, the values of these two features usually are far more than that of normal microblogs.

#### 2) Ratio of questioned comments

Rumor microblogs are prone to be challenged in its dissemination process, for its truth and sources are unreliable. Mendoza et al. found that information which turned out to be false was much more questioned than information which ended up being true [1]. Currently, almost all microblog platforms provide the comment service for their users and they can express their views to any microblog posts by using comment service. According to Mendoza et al., we can conclude that a post with many questioned comments has a high likelihood of being a rumor. Therefore, we use the feature of ratio of questioned comments to describe the questioned behavior of readers. The value of ratio of questioned comments is defined as TeX Source$$r({m_i}) = \frac{{\left\vert {{\text{questioned commets of }}{m_i}} \right\vert}}{{\left\vert {{\text{comments of }}{m_i}} \right\vert}}$$ where ${{\left\vert {{\text{comments of }}{m_i}} \right\vert}}$ is the number of comments of microblog ${m_i}$, and ${{\left\vert {{\text{questioned comments of }}{m_i}} \right\vert}}$ is the number of comments which questioned microblog $m{\text{i}}$.

In order to calculate the value of $r({m_i})$, we need to judge whether a comment is questioned at first. In this paper, we use Bayesian method [25] to do this task. The details of judgment can be summarized as follows.

1. Step 1)Collect a set of comments and label them with questioned or not by manual.
2. Step 2)Extract keywords from the collected comment and calculate the posterior probability of every keyword ${w_i}$ for each class. The calculation is as follows: TeX Source$$Pr\!\left({{w_i}\left\vert {\left. c \right)} \right.} \right. = {\left.{\sum\limits_{j = 1}^{{n_c}} {u({w_i},{m_l})} } \right/\! {{n_c}}}$$ where $c$ represents the type (questioned comment or not) of a comment, $u\left({{w_i},{m_j}} \right)$ is a function whose value will be 1 if the questioned comment ${m_j}$ contains keyword ${w_i}$, otherwise its value will be zero, and ${n_c}$ is the number of comments of class $c$.
1. Step 3)For a given unmarked comment $m{\text{i}}$, calculate its likelihood for each class based on the data calculated in step 2) and choose the class which maximizes this likelihood as the target class. The calculation of probability is defined as

TeX Source$${C_{{\text{map}}}} = \mathop {{\text{arg}}\;{\text{max}}}\limits_{c \in C} \prod\limits_{i = 1}^{{n_c}} {Pr\!\left({{w_i}\left\vert {\left. c \right)} \right.Pr\!\left(c \right)} \right.}$$ where $C\ {=}\ \{ \text{questioned}\ \text{comment,}\ \text{normal}\ \text{comment} \}$, $Pr({{w_i}{\vert}c})$ is the conditional probabilities calculated in step 2), and $Pr(c)$ is the prior probability of class $c$ and it is the fraction of the comments with a target classification of $c$ to all the collected comments. In (4), the value of $Pr\!\left({{w_i}{\vert}c} \right)$ may be problematic, since it would get value 0 for comments with unknown keywords. To eliminate zeroes, we use Laplace smoothing [26] to calculate the conditional probabilities for unknown keywords. The calculation is defined by TeX Source$$Pr\!\left({{w_i}\vert c} \right) = \frac{1}{{{n_c} + \left\vert v \right\vert + 1}}$$ where $\left\vert v \right\vert$ is the number of keywords extracted from the collected comment set in step 1) and ${n_c}$ is the number of comments of class $c$.

#### 3) Number of corrections

There are a number of microblog posts which tried to correct the misinformation and disinformation and these posts are named as corrections. Shirai et al. [16] reported that 14.7% people or organization would publish rumor correction as soon as they found the rumor microblogs. Obviously, rumor microblogs are involved in more corrections than common microblogs. The detailed steps of this feature extraction method can be summarized as follows.

1. Step 1)Construct the correction keywords dictionary which contain keywords such as “rumor” and “refute.”
2. Step 2)For a given microblog, get its corresponding keywords vector by using Tf-Idf method.
3. Step 3)Combine the keywords generated in step 2) and keywords from correction keywords dictionary to construct searching keywords and issue a search request to searching service provided by microblog systems. The number of return result is the value of this feature.
SECTION V

## EXPERIMENT

In this section, we demonstrate the collection of experiment dataset, the evaluation of the proposed user behavior-based features, and the experiment results.

### A. Dataset

We collect the microblog data from Sina Weibo, the China’s leading microblog service provider, to test the performance of the method proposed in this paper. Sina Weibo provides two Web services: 1) @Weibopiyao [20] and 2) Weibo Misinformation Declaration [21] to publish the rumor microblogs. In order to construct the training set accurately and efficiently, we use these published rumor information to label the rumor microblogs, rather than manually label them.

The rumor announcement instance pages in @weibopiyao and Weibo Misinformation Declaration are shown in Figs. 2 and 3, respectively. To construct the dataset correctly, we need to map a rumor announcement to its original rumor microblogs, and then obtain the user behaviors from the original microblogs. However, the URL of the original microblogs is not given directly on these announcement pages. Hence, we need to find a way to obtain the URL of rumor microblog posts from the rumor announcement pages.

We noticed that the rumor announcement pages usually follow a fixed format, so we are able to obtain the necessary information to acquire the URL of the original rumor microblogs through text structure analysis on announcement pages.

Fig. 2 illustrates a rumor announcement page of @Weibopiyao. As shown in Fig. 2, the value of the user name is shown immediately after the keyword “” (means “user”), and the rumor’s content is summarized between beginning-words “” (means “post microblog and say”) and ending-words “” (means “after inspection”). After extracting the value of user name and keywords from rumor announcement pages, we can construct a request URL: http://s.weibo.com/wb/”香格里拉”+”强拆””+”儿童”+”死亡” &xsort=time &userscope=custom:”Mickel” &Refer=g and send it to Sina Weibo Server, which issues a search request to Sina Weibo, and the original rumor page will be returned. In the request URL generated above, “” (a place of China), “”(means to demolish compulsively), “” (means children), and “” (means death) are extracted keywords from rumor contents and “Mickel” is user name of rumormongers.

Fig. 3 shows a microblog instance of Weibo Misinformation Declaration Web service. Different from “@Weibopiyao”, it provides a hyperlink named “” (“means original text”), which links to the original rumor microblog. Therefore, we can extract the URL of original rumor microblog from the HTML text of a page published in Weibo Misinformation Declaration.

In our experiment, we collect the microblogs published by the @Weibopiyao and Weibo Misinformation Declaration from December 18, 2010 to December 24, 2014. We extract the profiles of the rumormongers, collect the microblogs posted by their followers and followees, and build their profiles by using a crawler program. The collected microblogs belong to two categories: 1) labeled rumor microblogs obtained from @Weibopiyao and Weibo Misinformation Declaration and 2) unlabeled microblogs gathered from rumormongers’ followers and followees. For the unlabeled microblogs, we ask two annotators to put labels to them. Meanwhile, we use the Cohen’s kappa coefficient [22] to the measure the consistency between the two annotators. The Cohen’s kappa coefficient is defined by TeX Source$$\kappa = \frac{{{p_{{\text{observed}}}} - {p_{{\text{chanced}}}}}}{{1 - {p_{{\text{chanced}}}}}}$$ where ${P_{{\text{ observed}}}} = \frac{{\left\vert {A \cap B} \right\vert + \left\vert {C \cap D} \right\vert}} {{\left\vert E \right\vert}}$, and ${P_{{\text{chanced}}}} = \frac{{\left\vert{A}\right\vert\times\left\vert{B}\right\vert}}{{\left\vert E \right\vert}^2}+\frac{{\left\vert{C}\right\vert\times\left\vert{D}\right\vert}}{{\left\vert E \right\vert}^2}$. $A$ is the microblog set labeled by the first annotator. $B$ is the microblog set labeled by the second annotator. $C$ is the set of microblogs the first annotator cannot decide whether they are rumors or not. $D$ is the set of microblogs the second annotator cannot decide whether they are rumors or not. $E$ is the set of all the collected microblog, and $\left\vert\cdot\right\vert$ represents the size of a set. In our case, the Cohen’s kappa coefficient is set as $\kappa = 0.9635$, and it demonstrates that the two annotators reach high agreement in data annotation. Finally, the dataset constructed in this paper contains 9199 microblogs, which includes 1608 rumor microblogs and 7591 normal microblogs.

### B. Evaluation Metrics

In order to assess the performance of our approach proposed in this paper, we use conventional precision, recall, and F-score [31] as evaluation metrics. The precision $Pr$ is the fraction of the correctly predicted rumor microblogs to all the rumor microblogs identified. Recall $Re$ is the proportion of correctly predicted rumor microblogs to all the rumor microblogs. F-score can be considered as the harmonic mean of recall and precision. The calculation of precision, recall, and F-score are defined as (7), (8), and (9), respectively TeX Source\begin{align}Pr & = \frac{{\left\vert {{\text{correctly predicted rumor microblogs}}} \right\vert}}{{\left\vert {{\text{rumor microblogs identified}}} \right\vert}} \\ Re & = \frac{{\left\vert {{\text{correctly predicted rumor microblogs}}} \right\vert}}{{\left\vert {{\text{rumor microblogs}}} \right\vert}} \\ F & = \frac{{2pr \times re}}{{pr + re}}*100\% \end{align} where $\left\vert * \right\vert$ is the number of elements in set *.

### C. Experiment Results

In this section, we will show experiment results in two aspects. 1) We analyze the values distribution of users’ behavior features to illustrate the discriminate capacity of every feature. 2) We conduct a comparative experiment with baseline approach to test the performance of our approach.

#### 1) Discriminative Capacity of the Features

In order to test the discriminative capacity of features based on users’ behavior, we analyze the distribution of feature values in Fig. 4. It can be observed from the Fig. 4 that there exists a significant difference between rumor and normal microblogs when we employ the features, such as number of followers, average number of followees per day, average number of posts per day, number of retweets, number of comments, and ratio of questioned comments. As for features such as user type, number of information sources, and number of correction, there is no significant difference in terms of median according to Fig. 4. However, it does not mean that these three features are not effective in rumor identification since there exist significant differences between normal microblogs and rumors in the 50% of the largest values of those features. To achieve good performance using these three features, we usually apply other features to filter out the noise values, and then apply these three features. For example, many microblog posts are used to describe user’s mood or communicate with their friends. The number of information sources for these microblogs are similar to the rumor microblogs in the lower half of the feature values. If we use the number of reposts and the number comments to filter out these types of microblogs, the number of information sources will show promising discriminative capacity in rumor identification.

Fig. 4. Value distribution of users’ behavior features between rumor microblogs and normal ones.

#### 2) Rumor Identification Evaluation

Since there are differences in users’ behaviors when they publish or read normal and rumor microblogs, we represent a microblog with its author and readers’ corresponding behavior features, and identify whether the microblog is a rumor or not based on these features. In order to test the efficiency and general applicability of the proposed user behavior features, we train five classifiers: 1) logistic regression [24]; 2) SVM with RBF kernel function [25]; 3) Naïve Bayes [26]; 4) decision tree [27]; and 5) K-nearest neighbors [28] through tenfold cross validation strategy by using the open-source machine learning library Scikit-learn [29]. We compare the previously proposed features [9] [10] [11] [12] [13] [14] [15] [16] [17][18] with the user behavior features and employ the feature-selection method given in [29] and [30] to choose the best eleven features from those listed in Table II. They are 1) number of sentiment words; 2) number of the URLs; 3) user type; 4) number of comments; 5) registration age; 6) count followers; 7) number of posts; 8) number of reposts; 9) number of followees; 10) user name type; and 11) is reposted?.

Table II Selected Best 11 Features Based on Data From Sina Weibo

These eleven features altogether serve as the baseline for comparison with the proposed user behavior features.

#### 3) Rumor Identification Evaluation

Fig. 5 illustrates the experiment result of logistic regression algorithm. The precision, recall, and F-score of rumor classifier using the features of users’ behaviors are 0.8333, 0.6, and 0.6977, respectively, and those using the selected best eleven features are 0.7143, 0.6, and 0.6521. The classification accuracy is improved to varying degree which is 11.9%.

Fig. 5. Comparison between our approach and baseline approach based on logistic regression algorithm.

Fig. 6 demonstrates that the result of SVM algorithm, the precision, recall, and F-score of rumor classifier constructed based on users’ behavior are 0.8333, 0.7, and 0.7687, respectively, in this case, and those based on the selected best eleven features are 0.8182, 0.6, and 0.6923. The precision, recall, and F-score have increased 1.5%, 10%, and 7.64%, which means our approach and feature set not only improve the prediction accuracy but also can detect more rumors than baseline approach.

Fig. 6. Comparison between our approach and baseline approach based on SVM algorithm ($C=1$, $gamma =0$).

The experiment result of Naïve Bayes algorithm is shown in Fig. 7. The precision, recall, and F-score of rumor classifier using the selected best eleven features are 0.4, 0, 4, 0, and 4 while those using user’s behaviors are improved to 0.7143, 0.8532, and 0.7776. The precision, recall, and F-score have increased 0.3143, 0.4532, and 0.3776, respectively.

Fig. 7. Comparison between our approach and baseline approach based on Naive Bayes algorithm.

Fig. 8 describes the experiment result of decision tree algorithm. We can find from Fig. 8 that the performance of rumor classifier trained based on decision tree is the best among the five rumor classifiers constructed in this paper, and its precision, recall, and F-score reach to 0.8645, 0.8535, and 0.8590, respectively, and those using the selected eleven best features are 0.6667, 0.6, and 0.6316, respectively. The precision, recall, and F-score have increased 0.1978, 0.2535, and 0.1868, respectively.

Fig. 8. Comparison between our approach and baseline approach based on decision tree algorithm.

Fig. 9 shows the experiment result of K-nearest neighbors algorithm. The precision, recall, and F-score of rumor classifier using users’ behavior features reach to 0.9, 0.4, and 0.5538, and those using the best eleven features are 0.8889, 0.3, and 0.4485, respectively. Although the precision of the two classifiers constructed based on KNN algorithm are both high, but it does not mean the performances of these two rumor classifiers are the best among the eleven rumor classifiers trained based on five algorithms for the recall of these two rumor classifiers are both low, which means there are many rumor posts which cannot be identified by these two rumor classifiers.

Fig. 9. Comparison between our approach and baseline approach based on K-nearest neighbors algorithm ($k=30$).

From Figs. 59, we can observe that the performance of rumor classifier using users’ behavior features is better than that of baseline approach. Compared with the baseline approach, the precision, recall, and F-score of our approach have increased 13.14%, 18.13%, and 16.68% on average, which demonstrates the effectiveness of our method and the proposed features in rumors identification.

SECTION VI

## CONCLUSION

Microblog systems have become a new platform for information sharing, but they can also easily be utilized to spread rumors. It is of great importance to develop an automatic tool to identify the credibility of information spreading on the microblog.

In this paper, we investigate the rumor identification problem in microblog systems. We propose a user behavior-based rumor identification schemes, in which the users’ behaviors are treated as hidden clues to indicate who are likely to be rumormongers or what posts are possible rumor microblogs. The experiment results on real-world data from Sina Weibo demonstrate the efficacy of our method and features proposed in this paper. The precision, recall, and F-score of our approach have increased 19.24%, 18.3%, and 19.1% on average, compared with baseline result. The proposed new features will enrich the rumor identification feature database, and benefit the design of automatic rumor identification systems.

## Footnotes

This work was supported in part by the National Natural Science Foundation of China under Grant 61373091, Grant 91338107, and Grant 11102124; by the Ph.D. Program Foundation of Ministry of Education of China under Grant 20130181110095; by the Provincial Key Science, Technology Research and Development Program of Sichuan, China under Grant 2013SZ0002 and Grant 2014SZ0109; by the Sichuan Provincial Department of Science and Technology Project (No. 2014JY0036); and by the Scientific Research Fund of Sichuan Provincial Education Department (No. 13TD0014).

## References

No Data Available

## Cited By

No Data Available

None

## Multimedia

No Data Available
This paper appears in:
No Data Available
Issue Date:
No Data Available
On page(s):
No Data Available
ISSN:
None
INSPEC Accession Number:
None
Digital Object Identifier:
None
Date of Current Version:
No Data Available
Date of Original Publication:
No Data Available