COVID-19 Rumor Detection Using Psycho-Linguistic Features

During the onset of COVID-19 pandemic, the social media was flooded with misinformation. Irrespective of the type of the misinformation, such contents played a significant role in increasing confusion among people in the middle of an ongoing crisis. The purpose of the study is to investigate the nature of a specific type of misinformation, i.e., rumors, surrounding COVID-19. The study utilizes a publicly available and labelled Twitter dataset and proposes a novel feature space, which can detect rumor instances with high accuracy. The proposed feature space not only includes content-based features, but also includes psycho-linguistic features to further study the characteristics of the content from the perspectives of linguistics and psychology. The use of psycho-linguistic features has been utilised to understand certain dramatisation of text in the domain of conspiracy propagation and fake news detection. However, the use of such dramatisation detection approach has never been used for the purposes of rumor detection. Our study first outlines the differences between these categories of misinformation propagation and clarifies where rumor fits-in under the broader umbrella of misinformation. It further outlines how the use of psycho-linguistic features can also improve the detection accuracy of rumors on social media. The study demonstrates through multiple experimental setups that psycho-linguistic features improves the detection accuracy and associated performance measures, such as precision and recall, for COVID-19 rumors on Twitter. The observed improvements are consistent across multiple machine learning models.


I. INTRODUCTION
COVID-19 changed the world as we knew it. It is not only responsible for taking millions of human lives, but also for reshaping the world's political, economic, and social landscape to a significant degree. The impact of the pandemic on social lives world-wide is significant as it forced online social networks (OSN) to be the main platform of social interactions for a substantial period of time. People around the world were already relying on OSNs for maintaining a social life as well as getting their information in general. However, COVID-19 has significantly raised this dependency to a whole new level.
According to Google News [1], the word-wide COVID-19 related death toll is 6,434,754 as of August 2022, with The associate editor coordinating the review of this manuscript and approving it for publication was Mahdi Zareei . countries such as United States, Brazil, India, Russia, and Mexico topping the list in terms of number of deaths. During the early stages of the pandemic, people around the world started to feel anxious and panicked. On one hand, people were desperate for more information about the disease and its symptoms, and on the other hand, it was not possible to socialise physically, or reach out to a non-virtual news outlets. Therefore, people turned to online platforms to get information, as under the circumstances, it was the most accessible source. Among other platforms, Twitter has seen a significant increase in activity around the world during the period of the pandemic. According to their blog [2], between March 2020 and August 2021, there have been more than 22.2 million COVID-19 related Tweets in Australia, where the tweets initiated conversations related to health statistics, lockdown restrictions, government initiatives, and so on.
Word-wide, the number of Twitter users has increased by approximately 34% during the pandemic [3].
Although, OSN platforms such as Twitter provided a large group of scared and confused people the means to stay connected, the increased activity also presented fraudsters with the opportunity for spreading misinformation using these platforms. The purposes of these fraudsters can vary from personal gains to simply misleading people to promote chaos. Irrespective of the purpose, the misinformation themselves can be damaging and obstructive for healthcare workers and governments around the world. Therefore, it is absolutely critical to have implementable solutions that can automatically detect specific categories of misinformation on online platforms, such as, Twitter, which will drastically reduce the negative impacts these misinformation can cause. Consequently, it is imperative to study the nature of these online misinformation and propose a set of features that can contribute to higher detection accuracy.

A. THE CONTRIBUTIONS
In this paper we investigate the nature of rumors surrounding COVID-19 on Twitter. Our aim was to investigate whether certain linguistic markers that can identify dramatisation of texts, can improve the detection accuracy of these rumors, when included in the feature space. Our investigation demonstrates how these features indeed improve the detection accuracy of rumor tweets. The contributions of the study are briefly summarised below: • We propose a hybrid feature space for the automated detection of COVID-19 rumors on Twitter, which includes tweet content-specific features, twitter-specific contextual features, and psycho-linguistic features.
• We include psycho-linguistic features in the detection of rumor type of misinformation, to capture text dramatisation.
• We demonstrate how the psycho-linguistic features can improve the detection accuracy of COVID-19 rumors posted on Twitter using a publicly available dataset.

B. THE OUTLINE OF PAPER
The rest of the paper is organized as follows. The next section (section II) will lay down the background of misinformation in general, including the definition of specific types of misinformation. The chapter will also outline where rumor fits-in within the wider context. The next section (section III) will discuss related works in the domain and draw a comparative analysis between them. The subsequent sections (section IV, V, and VI) will present our research methodology including the details of the dataset, our research findings, and draw a conclusive discussion, respectively.

II. BACKGROUND AND MOTIVATION
In this section, we outline the background of misinformation and explain different terms that are associated with misinformation. We then attempt to highlight the devastating role a rumor can play during a world-wide health crisis. Finally, we present the scope of our research and list down the research questions.

A. DEFINITION OF MISINFORMATION
Within the context of OSN, the term 'Misinformation' refers to false or inaccurate information, which is disseminated through any OSN platform. However, there is more to it when it comes to specific types of inaccurate information, as a range of terms are often associated with misinformation on OSN.
According to the information disorder framework [4], inaccuracy or disorder in information can be classified into three broad categories, misinformation, disinformation, and malinformation. The framework defined them as follows: 1) Misinformation is information that is false, but not created with the intention of causing harm. 2) Disinformation is information that is false and deliberately created to harm a person, social group, organization, or country. 3) Malinformation is information that is based on reality, used to inflict harm on a person, organization or country. This category of information disorder is not within our interest, as far as COVID-19 rumor detection is concerned.
Under these broad categories, there are several types of information disorder, which are generally referred to as misinformation on OSN. Some of the types, which are often studied and addressed in research in the domains of social network analysis and data science, are defined below. Parts of the classification of these misinformation types and their definition has been addressed in previous research works [5], [6], [7]. The list below synthesizes and summarises them and also aligns them with the broader categories of the information disorder framework. However, the list does not include information disorders such as hate speech, cyberbullying, defamation, etc., since they fall under malinformation within the broader category, which is outside the scope of this research.
• Fake news: A fake news is a piece of news article that is false and spread intentionally. It is a means of spreading a specific propaganda. The intention behind the spread of a Fake news may include the intention of causing harm. Therefore, an instance of fake news can be a misinformation or a disinformation.
• Urban legend: An urban legend is an intentionally-spread fictional story. It also falls in the category of misinformation since they are created for entertainment purposes and usually do not include an intention of causing harm.
• Rumor: A rumor is a piece of information, where the truthfulness of the information is doubtful or uncertain.
Often, rumors are spread on OSNs with intention of causing deliberate chaos, which results in harm. Similar to fake news, a rumor can also be a misinformation or a disinformation.
• Conspiracy: Conspiracy theories are alternative and often dramatic explanations of events, as opposed to the VOLUME 10, 2022 actual explanation. People who spread conspiracy theories often do so with the intention of causing harm, and thus, they are regarded as disinformation.
• Astroturfing: Astroturfing is the process of marking or faking grass-root opinion. Astroturfers mask the sponsors or organisations while spreading an opinion about an entity (product, person, service, etc.) to portray as though the opinion is coming from grass-root participants of the society. Astroturfing is regarded as disinformation, since there is an intention of causing financial, political, or social harm.
• Crowdturfing: Crowdturfing is also a popular term, which refers to astroturfing that is crowd-sourced. Similar to astroturfing, crowdturfing can be categorised as disinformation. Figure 1 illustrates the different types of information disorder and where the fall within the context of misinformation and disinformation. The purpose of outlining the definition and categories of some of the misinformation terms in this sub-section is twofold. Firstly, the definitions help us establish a scope of our research, and secondly, they help us isolate some of the specific terms, which are relevant for inaccurate information surrounding COVID-19. For an emerging health related crisis, such as, COVID-19, the specific type of inaccurate or false information which is the most relevant is rumor. The closest type of false information that has a grey boundary with rumors are conspiracy theories. However, conspiracy theories are more relevant for political and social events. For the purposes of this study, we will not differentiate between these two types (rumor and conspiracy theory) and refer to all information of doubtful veracity related to COVID-19, as rumors.

B. ROLE OF RUMORS DURING A HEALTH CRISIS
During a health crisis, the importance of accurate information is critical. The policy makers and healthcare professionals are presented with the challenge of containing and managing the health crisis. In addition to that, people become anxious about the consequence of a disease, and worried about their friends and family. Under these circumstances, inaccurate information can have severe negative impact on the stability of the society. The impact becomes more severe when the crisis is as big as a pandemic.
As described in a Time news article [8], the rumors on social media during the outbreak of Ebola virus in 2014, created unnecessary panic, causing enormous waste of resources in the United States. The rumors included inaccurate information on how the disease spreads and how prevalent the disease was at that stage in a particular geographical area.
The impacts of rumors were far more severe at the early stages of world-wide COVID-19 outbreak. Studies [9], [10] reported how the use of cow dung and urine spread across India based on rumors on social media as a treatment for COVID-19.
An article [11] published in the Australian edition of 'The Conversation' outlines how based on internet rumors, people were calling the New South Wales Poison Information Centre to enquire about the benefits of inhaling hydrogen peroxide, gargling or swallowing antiseptics, bathing in bleach or disinfectant, and spraying face masks with disinfectants, in order to fight COVID-19. All of the above actions can have severe consequences on human health, including death.
There are several research works [12], [13], [14], [15], [16] that also reports rumors and their impacts, surrounding the treatment, spread patterns, do's and don'ts, potential home remedies, medicines, etc., related to COVID-19, where the rumors are essentially spread through online platforms and can cause severe damage to human health, or hinder the efforts of healthcare professionals to tackle the pandemic around the world.
More general forms of rumors and conspiracy theories are also reported to have destructive and disruptive impacts on the community. Research work surrounding the detection of COVID-19 conspiracy theory by Shahsavari, et al [17], reports the initiation of incidents including destruction of cell phone towers, racially fuelled attacks against Asians, demonstrations to defy public health orders, etc., based on rumors and conspiracy theories propagated through OSN platforms. Therefore, it is absolutely critical to be able to develop a mechanism that can detect and prevent such rumors from spreading. To that end, in this study, we propose a model with hybrid feature space that makes use of psycho-linguistic features, in addition to content-based and contextual features, to improve the detection accuracy of COVID-19 rumors.

C. OUR SCOPE AND RESEARCH QUESTIONS
As discussed in the previous sub-sections, we focus on automated detection of COVID-19 rumors on Twitter. We treat all information where the truthfulness or veracity of the information is doubtful, as rumors. We also identify whether certain psycho-linguistic features, which may indicate dramatisation of written text, can improve the detection accuracy of such rumors.
In general, the purpose of this paper is to answer the following research questions: 117532 VOLUME 10, 2022 • RQ1. Is there a pattern of text dramatisation present in the COVID-19 rumor tweets?
• RQ2. What are the psycho-linguistic features that can identify such dramatisation of text?
• RQ3. Can these psycho-linguistic features, in addition to the content-based and contextual features, improve the detection accuracy of COVID-19 rumors on Twitter?
In this paper, we address these research questions and make novel contributions. Firstly, we propose a hybrid feature space, which includes tweet content-specific features, twitterspecific contextual features, and psycho-linguistic features. We also demonstrate how the psycho-linguistic features can improve the detection accuracy of COVID-19 rumors posted on Twitter using a publicly available dataset. The next section outlines the related work in the relevant domains of misinformation detection and draws a comparative analysis among works that are highly relevant to our research.

III. RELATED WORK
Since there are several areas of information disorder, as outlined in the previous section, we organise the discussion on related work based on these areas and focus on similar approaches of feature engineering. More specifically, we want to highlight what types of features have different research approaches used in the area of inaccurate information detection. We first present related approaches in the domains of fake news detection, followed by astroturfing and crowdturfing detection techniques. We then discuss works on general rumor detection. Finally, the section outlines related works on misinformation/disinformation detection specific for COVID-19 with a comparative analysis.

A. RELATED WORK IN THE DOMAIN OF FAKE NEWS DETECTION
The detection of fake news has been a trending research topic in recent years. There are several studies, who utilised data mining and feature engineering techniques to tackle the spread of online fake news. As outlined in several studies [18], [19], [20], [21], the features analysed by research in the area includes content-specific features, linguistic features, visual features, user-specific features, network-based features, source-based features, environmental features, etc.
The research work by Chen et al. [22], utilizes user sentiment and spread patterns of tweets on Twitter to detect fake news. Network-based spread patterns were also studies in several other research works [23], [24], [25] concentrated on fake news detection. The research work by Rashkin et al. [26], includes linguistic features to identify dramatisation of texts to detect fake news. Similar research works [27], [28], [29] have also investigated textual or linguistic features of the news content itself for fake news detection. The research work by Shu et al. [30] demonstrated how user profile specific features including their political bias, personality, etc., can play an important role in the detection of fake news. Additional user profile information such as user activity and social graph, user's content creation pattern, etc., have been utilised by similar research works [31], [32] in the domain of fake news detection.

B. RELATED WORK IN THE DOMAIN OF ASTROTURFING AND CROWDTURFING DETECTION
As outlined in previous study [33], approaches in the domain of astroturfing and crowdturfing detection can be very diverse. While approaches including authorship attribution [34], [35], [36] and analysis of network flow features [37], [38], [39] have been utilised in the area of astroturfing and crowdturfing detection, majority of automated detection approaches focus on identifying content-based or user-based features.
The research work by Cheng et al. [40] investigated comments on news portals to detect corporate astroturfing, where the authors considered similarity measures among comments, including features related to user activity and interaction time with contents. Analysis of content and user-based features in microblog environments, has also been carried out by other research works [41], [42], [43], [44], [45].
Features surrounding individuals or features applicable for a group of people have been studies for astroturfing and crowdturfing detection as well. Research [46], [47] demonstrates that group level features, such as, group time window, group deviation, group consent similarity, tweeting habits of a group, such as, postings of original tweets, retweeting someone's tweets who is not a friend, etc., reveal interesting information about crowdturfing groups in microblog environments, such as Twitter. Twitter-specific features such as, user profile and activity features (e.g., longevity of account, tweet steadiness, sparseness), network features (e.g., number of friends and followers), and personality features (e.g., tweet emotion), were also demonstrates to be very effective for crowdturfing detection [48].

C. RELATED WORK IN THE DOMAIN OF RUMOR DETECTION
In the domain of rumor detection, in general also includes heterogeneous approaches. In their research work, the authors Ma, et al. [49] developed two recursive neural networks. The top-down and the bottom-up tree-structured neural networks were proposed to track the propagation of general rumors in a microblog environment, such as, Twitter. The authors demonstrated that their tree-structured neural network is able to detect rumors at an early stage of propagation. Similar research works [50], [51] also propose neural network models that utilise multiple neural networks and investigate temporal, content, and propagation features to detect rumors on microblog platforms. The research work by Yang et al. [52] proposes a graph adversarial learning method to detect various strategies taken by perpetrators to camouflage rumors to bypass propagation-based detection methodologies. The work by Liu et al. [53] implements a structure-aware retweeting graph neural network, where the authors propose re-structuring of the retweet graph to align with binary tree structure, without losing any propagation information. The authors also propose integration of content-based, user-based, and pattern-based features for improved rumor detection. The usefulness of time-series features, in addition to contentbased, user-based, and lexical feature was demonstrated in similar research work by Shelke and Attar [54].

D. RELATED WORK IN THE DOMAIN OF COVID-19 MISINFORMATION OR DISINFORMATION DETECTION
Since the outbreak of COVID-19, several studies concentrated their efforts in automated detection of COVID-19 related information disorder on OSNs. Majority of the work focused on investigating contents on microblog environments, such as, Twitter, since the propagation speed and reach of such environment is higher as opposed to other classes of OSN platforms.
Research work by Moffitt et al. [55] addresses COVID-19 related conspiracy theory detection on Twitter, based on analysis of user-identities, their countries of origin, patterns of bot activities, and content of the tweet, such as, hashtag and URL analysis. The research work by Al-Rakhami and Al-Amri [56] utilises tweet content-based, tweet-based and user-based features and proposes an ensemble-learning-based framework for detection of non-credible tweets on Twitter.
The research work by Elhadad et al. [57] investigates the contents from reputable news sources and fact-checking authorities to generate ground truth data. The authors then investigate content-based features and apply multiple machine learning models for the detection of COVID-19 misleading information. The authors implement TF-IDF as the feature extraction technique on a bag-of-word model (BOW), that includes words of certain parts-of-speech tags, metadata including location-based, user-based, and time-based features, etc. A Similar research work by Al-Ahmad et al. [58], also uses TF, TF-IDF, and BOW models for feature extraction. The authors also proposed to reduce the number of symmetrical features by implementing wrapper feature selections for evolutionary classifications using particle swarm optimization (PSO), the genetic algorithm (GA), and the salp swarm algorithm (SSA). The authors utilised these features for detection of misleading information about COVID-19 using publicly available dataset of consist of 6,000 news articles, generated by a thesis [59].
The work by Hossain et al. [60] uses a corpus of Wikipedia misconceptions related to COVID-19 and classifies Twitter Tweets based on the support, deny, or neutral stance the tweet expresses in relation to the misconceptions. The research includes identification of the misconception instance that is related to a tweet, and then identification of the specific stance of the tweet towards the misconception. The research includes analysis of the contents of the tweets and misconception using several NLP techniques. A Similar research work by Vijjali et al. [61], analyses the contents of claims related to COVID-19 and computes the textual entailment between the claim and the true facts retrieved from a manually labelled COVID-19 dataset.
The work by Li et al. [62] proposes a multi-lingual and multi-dimensional COVID-19 fake news data repository. The authors collected 3981 pieces of fake news content and 7192 trustworthy information from 6 different languages, i.e., English, Spanish, Portuguese, Hindi, French, and Italian. The authors also demonstrate the reliability and robustness of the dataset by analyzing several features including social-interaction-based, tweet-based, user-based features.
Another research work by Memon and Carley [63], offers yet another COVID-19 misinformation dataset, including 4,573 annotated tweets. The authors also offer interesting insights into the characteristics of informed and misinformed groups of people. The authors performed network analysis, analysis of bots, and analysis of socio-linguistic features to differentiate between the two groups. Similar research work by Heidari et al. [64] applied the Bidirectional Encoder Representations from Transformers (BERT) on publicly available dataset. The authors used content-based, tweet-based, and user-based features for the classification of whether a tweet is generated by a bot or not. Their research concludes that the COVID-19 fake news is usually generated by human accounts, not bot accounts.
The research work by Cui and Lee [65], proposes a COVID-19 healthcare misinformation dataset that contains 4,251 news articles and 296,000 related user engagements. The authors also apply several machine learning models using different features including, content-based (text and image), sentiment-based, user-based, etc., to provide further insight into the classification of misinformation using the dataset. Related research work by Zhou at al. [66], offers another repository of COVID-19 misinformation, including fake news and conspiracies. The authors collected 2,029 news articles and 140,820 tweets, that circulated these news articles on Twitter. The authors also performed data analysis using textual, visual, temporal, and network features to provide a baseline model for future research. The dataset offered by the research work by Shahi and Nandini [67] includes 5,182 fact-checked news articles for COVID-19, across multiple language and countries. The news instances where crawled from fact-checking websites, i.e., Snopes and Poynter. The authors also applied NLP techniques for brief analysis of the data repository.
The research work by Paka, et al. [68], offers another public dataset for COVID-19 fake news detection. The authors propose a semi-supervised model, i.e., Cross-SEAN (cross-stitch based semi-supervised end-to-end neural attention model), and also an extension for the Chrome browser, i.e., Chrome-SEAN, which can automatically flag COVID-19 related fake news on Twitter. The authors utilised several textual and linguistic features (e.g., number of hashtags, number of user mentions, media count in the tweet, sentiment of the tweet text, counts of various part-of-speech tags, etc.), tweet-specific (e.g., number of hashtags, number of favourites, number of retweets, retweet status, etc.), and user-specific features (e.g., verified status, follower count, favourites count, number of tweets, recent tweets per week, etc.).
The research work by Cheng et al. [12] also offers a comprehensive dataset for COVID-19 rumor detection. The dataset contains 4,129 news records and 2,705 tweets. The dataset is manually labelled and contains information about the veracity of the content, including stance and sentiment. The authors also performed deep learning based rumor classification on both the news and twitter dataset. The Twitter dataset contains contextual features related to the tweet, including Reply/Retweet/Like (RRL) numbers. Our research on COVID-19 rumor detection is based on the Twitter dataset offered by this particular research work.
Besides conventional feature engineering and machine learning approaches, there are other approach as well for detection of COVID-19 mis/disinformation. For example, the work by Shahsavari et al. [17], which was inspired by narrative theory, was also demonstrated to be effective in understanding the nature of COVID-19 rumors and conspiracy theories, where the authors attempt to understand the narration patterns of these conspiracy theories and identifies different clusters of communities, responsible for the propagation of these theories.
Finally, to put things into perspective, we present some of the highly relevant works in the domain of COVID-19 mis/disinformation detection in Table 1. Among other things, the table highlights the differences in the classes of features that have been used in these research works in comparison to the classes of features included in the features space of our proposed model. The table refers all features that are not content-specific, for example, tweet-specific features of retweet numbers, like numbers, etc., networkbased features, user-based features, etc., as 'contextual features', since they add additional context to the content of the mis/disinformation. In addition, the table lists down 'Unspecified' under the focused area for the research works that do not specify a type and refer to mis/disinformation related to COVID-19 as, 'misinformation', 'misleading content', 'non-credible content', etc. As Table 1 outlines, our proposed feature space includes analysis of psycho-linguistic features, which makes the feature space a novel one. In the next section, we outline our research methodology, including the description of the dataset used, the feature space design, rationale behind including psycho-linguistic features, and experimental design.

IV. PROPOSED METHODOLOGY
In this section, we first outline the details of the COVID-19 rumor dataset. We then present the methodology of our feature space design. Finally, we conclude the section with the details of our experimental setup.

A. THE COVID-19 RUMOR DATASET
The COVID-19 rumor dataset is publicly available on GitHub. 1 The dataset is populated and released by research work by Cheng et al. [12]. The dataset contains two corpus of data, specific for news based rumors and tweet based rumors. The first corpus contains 4,129 instances of news records published in several news outlets. The corpus was generated using the Google Search Engine. The second corpus of data contains 2,705 tweets from Twitter. The Twitter data was crawled using COVID-19 related tags, for example, 'COVID-19', 'coronavirus', 'COVID', etc. For the twitter dataset, the authors [12] not only collected the tweets themselves but also collected the associated comments and their metadata. The entire dataset was labelled by multiple human annotators, and each instance of news or tweet was labelled, True(T), False(F), and Unverified(U), based on their veracity. In addition to veracity, the dataset was also labelled based on the stance and sentiment of the tweet. For the purposes of our research, we only focused on the corpus generated from Twitter, since our focused area is microblog environments. Table 2 outlines all attributes of the main twitter dataset, including example cell value, possible values for discrete features, and description for each of them. For the sentiment label, the authors identified the sentiment of a tweet, manually. The authors also cross-checked the labelled sentiment with online sentiment analysis tool, MonkeyLearn, 2 and reported that the manual sentiment labels are more accurate, given the context of the tweets.
All tweets in the Twitter dataset with these attributes are listed in a CSV file named 'twitter.csv'. Furthermore, the dataset also contains all comments and associated information of comments, using the stance label, in a separate CSV file, for each instance of tweet in the 'twitter.csv' file. The specific CSV file containing the comments of a tweet can be traced using the source attribute value of a specific tweet. Table 3 lists down the attributes in a comment file including the description for each of the attributes. For each of the comments, the authors The stance value of the comments were labelled manually. The authors used the classical rumor 2 https://monkeylearn.com/ stance classification, where the stance can be support, deny, comment, and query [69].
Given our task in this paper is to classify an instance of tweet from this rumor dataset into either a true or a false piece of information, we removed the tweets with label 'U', which are unverified rumors, where the veracity is unknown.

B. FEATURE SPACE DESIGN
In this sub-section, we focus on our feature selection approach and the process of generation of feature vectors from the COVID-19 rumor dataset. The feature vectors are then fed to multiple machine learning algorithms for the training of rumor classification task.

1) RATIONALE FOR SELECTING CONTENT-BASED AND CONTEXTUAL FEATURES
The first two categories of features that are crucial for identification of rumors on a microblog environment are the content-based and contextual features.
Contextual features on the other hand are features that add more context information to the content in these microblog environments. For example, the tweet-specific features, such as, the popularity of a tweet, including number of likes, retweets, and replies, the acceptance, rejection, or indifference of an opinion expressed in a comment of a tweet, etc. These tweet-specific contextual features have been used consistently in several past research works [31], [32], [62], [63], [64], [65], [66] for identification of mis/disinformation on microblog environments. The list of all content-based and contextual features that have been used in our research are listed in section IV-B4.

2) RATIONALE FOR SELECTING PSYCHO-LINGUISTIC FEATURES
Psycho-linguistic features refer to features that are generated by analysing the language used in the content, with the goal of identifying the psychological and emotional profile of the user, who is responsible for creating the content. Using a set of psycho-linguistic features, we can understand the mindset of a certain group of people, which can be critical for identifying groups that are responsible for the propagation of misleading information on OSN platforms.
For example, previous research work by Ott,et al. [71] demonstrates that the use of first-person and second-person pronouns are indicative features for imaginative writing, where a person writes something with or without harmful intentions, which is far from the fact. The authors [71] also reported the use of superlative and comparative words in imaginative writing. These characteristics of imaginative writing have been utilised and corroborated in research work by Rashkin, et al. [26], where the authors used several linguistic features for the detection of fake news. The authors demonstrated that subjective words are used often to dramatize or sensationalize a news story. The authors also associate the use of action adverbs and manner adverbs with the dramatisation of written text for the purposes of attracting readers. The research [26] also reports that fake news articles use more swear words, subjective words, superlatives, and modal adverbs, which are used to exaggerate a piece of news. Several research works [72], [73], [74], reports the usage of hedge words for identification of vagueness and uncertainty in written language, which are also found to be relevant in detection of fake news [26].
The use of psycho-linguistic features was also demonstrated to be effective for detection of conspiracy propagators [6], where the authors investigates personality, sentiment, emotions, and linguistic patterns of the users to identify propagation of conspiracy theories. The authors reports that anti-conspiracy propagators express more emotions in their tweets compared with conspiracy propagators. Similar research [75] also demonstrates how the analysis of emotions can help with the identification of fake news in social media. Moreover, dramatic events, such as, natural disasters, mass outbreaks of diseases, etc., tends to start waves of online discussions, which have certain characteristics. Research [76] demonstrates that discussions happening in an online platform following a dramatic event exhibit sign of emotional shock, increased language complexity, and simultaneous expressions of certainty and doubtfulness, which provides insight into how spread of conspiracy theories can be identified during the escalation of an event.
Based on these research work, it is coherent that a certain degree of text dramatization and exaggeration is directly associated with the spread of mis/disinformation on OSN. Furthermore, the study of sentiments and emotions that are expressed in these mis/disinformation can also add value to the detection strategies. Therefore, our research aims to address whether these text dramatization, sentiment, and emotion markers are also relevant for COVID-19 rumor detection and hence, studies a set of psycho-linguistic features.

3) THE GENERATION OF PSYCHO-LINGUISTIC FEATURES
For the generation of the psycho-linguistic features, we have used several lexicons. The swear words dictionary was generated from the Noswearing 3 website. The dictionaries of modal adverbs, action adverbs, manner adverbs, comparatives, and superlatives, were downloaded from public repository, made available by Rashkin, et al. [26]. The authors compiled these dictionaries from Wiktionary 4 word lists. The dictionary of hedge words was compiled manually from previous research work by Yüksel and Kavanoz [77]. Using these dictionaries, we generate the a list of binary psycho-linguistic features based on the contents of the tweets from the Twitter corpus of the COVID-19 rumor dataset.
We also include expressed sentiment and emotion as part of the psycho-linguistic feature set. The sentiment information was extracted directly from the Twitter corpus of the COVID-19 rumor dataset. For analysis of emotion, we used the emotion detection model, EMOTEX, proposed by Hasan, et al. [78]. According to this model, emotions can be categorised into four general classes. The classes and some of the emotions that falls under each of these classes are listed below: We analyse the hashtags in the tweets themselves in order to associate an instance of tweet with any one of these four classes. Figure 2 illustrates our process of psycho-linguistic feature generation, graphically. A complete list of psycho-linguistic features used as part of our research has been given in the next section IV-B4.

4) SUMMARY OF FEATURE SPACE
As discussed in the previous sections, our feature space contains three categories of features: • Content-based, which are based on the content of the tweets • Contextual, which are tweet-specific features and provides additional contextual information to the tweets, and, • Psycho-linguistic features, which are linguistic features to better understand the psychology and emotions of a person responsible for creating a microblog content Table 4 lists down the categories of features, including individual features for each category, in a summarised manner.

C. THE EXPERIMENTAL SETUP
Our experimental setup is focused surrounding identifying the answers to our research questions. We want to investigate how efficiently a false rumor can be detected using: • Only psycho-linguistic features • Only content-based and contextual features, and • The combination of psycho-linguistic, content-based, and contextual features Our aim is to see the differences in performance measures of different machine learning algorithm, for different feature space setup.   experimentation labels and the corresponding feature space design. We assign these labels to be able to differentiate, refer to, and discuss the different feature classes more clearly.
For machine learning purposes, we use WEKA [79], a software suite, that supports data analysis and mining using multiple machine learning algorithms. For our research, we have used J48 Decision Tree, Random Forest, Naive Bayes, and JRip Rule-based classifiers. The purpose of using multiple machine learning algorithm was to check the consistency of acquired knowledge.
We conducted all three of the above experiments (A, B, and C), using each of the 4 classifiers. In the next section, we present our experimental findings, for all combinations of experimental label and classifier, and draw a conclusive discussion on our findings.

V. RESULTS AND DISCUSSION
In this section, we outline our experimental findings and discuss a few aspects of the observed results. The analysis of the findings is divided into several subsections to facilitate the discussion. First, we present our experimental findings for experiments A, B, and C, including a discussion on the observed performance measures. We then analyze the psycho-linguistic features further and offer additional insights into their distributions and prevalence in the dataset.
A. EXPERIMENTAL RESULTS ACROSS DIFFERENT SETUPS Table 6 outlines our experimental results. For each of the combination of a machine learning algorithm and a labelled experimental setup, we recorded the accuracy, precision, recall, and f-measure as the performance matrices for the model.
As outlined in Table 6, the trend of differences in the observed performance measures for the three different experiment labels, is quite consistent across all four machine learning algorithms. The Naive Bayes classifier demonstrates the least performance, whereas the Random Forest classifier performs the best. However, irrespective of their differences in performance, all measures across different classifiers consistently support the trend of improved performance when psycho-linguistic features are included in the features space, in addition to content-based and contextual features. The measures also demonstrate that the psycho-linguistic features alone cannot provide the same level of performance as the feature set of content-based and contextual features. This is an expected behavior, since, the set of psycho-linguistic features focus only on one aspect of the content, i.e., the linguistic features that reveals information about the psychology or emotion of the content creator. However, as demonstrated in the related work section, all previous works in the domain (summarised in Table 1) have used the content-based features as the base feature set, since they convey the most information in terms of how a false information is different from an accurate information. Therefore, without considering the content-based or contextual features, it is not feasible to accurately classify an instance of information as false or true.
Nevertheless, for the purposes of our discussion, the area of the observed results that we would like to focus on is the difference in performance measures between experiment B and experiment C. The only difference between experiments B and C is the psycho-linguistic feature set. Experiment B was conducted by using the content-based and contextual features, whereas, experiment C was conducted using contentbased, contextual, and psycho-linguistic features. As we can see from the observed results in Table 6, the performance measures for all four classifiers show consistent improvement in experiment C, compared to the measures in experiment B. For example, the accuracy of the J48 decision tree classifier, is increased to 77.80% in experiment C, from 76.34% in experiment B. The values of precision, recall, and f-measures are also increased to 0.768, 0.800, and 0.784, respectively, in experiment C, from the observed values of 0.767, 0.760, and 0.763, respectively, in experiment B. Even for the classifier with the least performance (Naive Bayes), in experiment C, the accuracy, precision, recall, and f-measure values

B. THE DISTRIBUTIONS OF PSYCHO-LINGUISTIC FEATURES
In order to gain more insight into the distributions and prevalence of the psycho-linguistic features within the dataset, we conducted further analysis of the feature space. We illustrate a summary of our findings in Figure 3. The first subillustration(a) shows that among the false rumors, majority express a positive sentiment, and the expressed sentiments among false rumors are neutral, positive, and very positive. This distribution of sentiments among false rumors aligns with our initial rationale of selecting sentiment as part of the psycho-linguistic feature set, where we discussed the patterns of content creators to sensationalize or enliven a story, during the spread of false rumors. An ecstatic expression of sentiment (positive and very positive) in majority of false rumors is an affirmation of our initial understanding.
The second sub-illustration(b) in Figure 3, shows a different side of rumors. The distribution of emotions among false rumor instances, shows that the majority of emotions expressed through hashtags in the false rumor tweets represents the emotion class of unhappy-active. This particular emotion class is an overarching representation of granular emotions, such as, 'Tense', 'Angry', 'Afraid', 'Annoyed', 'Distressed'. This distribution is logical on its own, since the most prominent emotions expressed in the tweets related to COVID-19 should reflect fear, anger, distress, etc. The other types of emotions that are often observed in false rumors equally are happy-active and happy-inactive, which represents similar tones of positive sentiments, as observed in the sentiment distribution. It is worth noting here that the EMOTEX [78] model for emotion analysis relies on the existence of hashtags associated with a tweet. The model clusters a group of hashtags under one specific emotion class and does not perform analysis of the tweet content, whereas the sentiment analysis in the COVID-19 rumor dataset has been performed by analysing the contents of the tweets. Therefore, the distribution of sentiments and the distribution of the emotions may not align categorically. In future extension, we plan to look into more advanced approaches of emotion analysis, such as the NRC emotion lexicon [80].
The third sub-illustration(c) in Figure 3, demonstrates the percentages of false rumors containing different categories of words. From the bar-chart, we can see that 1 in every 4 false rumor instances contains a hedge word, 1 in every 10 rumor instances contains a superlative, and 1 in every 10 rumor instances contains a comparative word. The other linguistic markers observed among false rumors include 1st person pronouns, 2nd person pronouns, manner adverbs, and modal adverbs. The observations confirm the presence of linguistic markers that can represent dramatisation or vagueness of text among COVID-19 false rumors, and further justifies the design of our feature space.

C. A SUMMARY OF OBSERVATIONS
Based on the discussion in the previous two subsections, we now draw a conclusive summary of observations and associate the observations with our framed research questions.
• The observed results across all experimental setups clearly demonstrate how inclusion of the psycholinguistic features can improve the detection performance of COVID-19 rumors in real-life dataset. The observed measures in experiment C, which included the psycho-linguistic features in the feature space, are consistently higher compared to the observed measures in experiment B, which did not include the psycholinguistic features. This inference answers our third research question (RQ3).
• As outlined in our rationale for selecting psycholinguistic features in section (IV-B2), there are certain linguistic markers that are often observed in dramatized texts. Our analysis of the psycho-linguistic features reveals that these linguistic markers are indeed present in the COVID-19 rumor tweets. The presence of these markers corroborates the fact that the false rumor instances in the COVID-19 rumor dataset exhibit a certain degree of text dramatisation for enlivening the content or attracting attention of the readers. This deduction answers our first research question (RQ1).
• Our analysis of the psycho-linguistic features further offers insight into the distributions and prevalence of specific features within the feature space. From the analysis, we can observe that certain features (such as, the presence of hedge words) are more prevalent in rumors, among the set of linguistic markers. We can also observe clear patterns of sentiment and emotion distributions among rumors. These observations provide insight into the features that can identify the dramatisation in texts more clearly and answers our second research question (RQ2).

VI. CONCLUSION AND FUTURE DIRECTIONS
The aftermath of COVID-19 will continue to have an impact on the world for many years to come. The success of ongoing management and recovery efforts heavily rely on the progress of research in the domain of health science. However, part of the success, also rely on the accuracy of information that is available online. An online mis/disinformation, which is related to the pandemic can have devastating effects on the health of the patients, the well-being of vulnerable groups in the society, and the efforts of the healthcare and government professionals. In this paper, we address this concerning issue surrounding the spread of COVID-19 related mis/disinformation on OSN. We propose a novel feature space for the detection of COVID-19 rumors and signify the effectiveness of the proposed feature space using reallife dataset. Our research demonstrates that a set of psycholinguistic features, that reveals interesting information about the psychology and emotion of the content creator, can also provide insight into the veracity of the content. Our proposed model can be used in the back-end of an OSN platforms for the detection and prevention of the spread of such mis/disinformation. The model can also be used for detection of rumors related to any social topic, as the features themselves are not specific to COVID-19 pandemic. The future extension of this work may include further study into the behavioural and psychological characteristics of the OSN users, which may help with the expansion of psycho-linguistic feature set, consequently, increase the performance measures of the detection model. Additional linguistic features that can identify text dramatization or vagueness, more accurately can be included in the future model, which requires further study of the linguistics in general. Future studies can apply more NLP techniques to further investigate the patterns of linguistic markers in rumors. Future studies may also include detection strategies suitable for other languages, since a large number of OSN users do not use English as the language of written communication. There is also scope for improvement in the model, by including a medical knowledge base as part of the model, built around medical facts, which may help debunk some of the false rumors straight away. However, such directions must include collaboration across multiple disciplines.