Using a Hybrid-Classification Method to Analyze Twitter Data During Critical Events

,


I. INTRODUCTION
For the past few decades, social media has been an essential platform for its users to share their thoughts, views, and feelings and help create a virtual bond between users. People contact each other and build relationships through messages, comments, posts, and likes. Social media platforms are used for sharing information, advertising, social events (crisis) and political purposes. Twitter is among the online social media platforms where users can read, comment and update short messages known as tweets. Twitter data is increasingly being used for numerous purposes, such as forecasting financial exchange prices, box office revenue for movies, and identifying clients with negative sentiments [1]. Twitter platform is used by journalists in critical events to gather information The associate editor coordinating the review of this manuscript and approving it for publication was Oğuzhan Urhan . posted by users and re-tweet valuable messages. The output of textual documents on social media increases exponentially.
Scientific research on the semantic content of such tweets is known as sentiment analysis [2]. A survey was conducted in 2018 to estimate the number of active Twitter users. There were 326 million active users on Twitter, and they Tweet over 500 million tweets per day [3]. Analysis of sentiment helps determine the polarity of a given text. It is used for extracting people's opinions, feelings and thoughts using Natural Language Processing (NLP) [4]. The aim is to identify the positive and negative value according to the standard categorization. Sentiment analysis is often referred to in a document classification task as opinion mining. The objective of sentiment analysis is to analyze the attitude of various individuals to a particular subject [5]. It is a challenging job to extract the sentiment information due to the volume and variety of data [5], [6]. Recently, sentiment analysis using machine learning has become a significant research field. It is considered a classification problem (machine learning) in which the user can determine the algorithm for classification [7]. In a moment of crisis or critical incidents (health crises or natural disasters) where communities are disturbed, online social platforms including Twitter become a significant platform for information sharing.
Sentiment analysis of data from Twitter during critical events is considered a complicated task. Earlier, Lexiconbased approaches (unsupervised method) were used to anticipate the polarity of the given text, where text data is organized into a predefined set of sentiment classes and the sentiment lexicons are used to determine the sentiment score of the text [8]. However, lexicon-based methods are not practical approaches since the polarity of text data can often vary, as shown in the lexicon. So instead of lexicon-based methods, many scholars tend to use context-based lexicons to avoid issues with term polarity. In context-based lexicons, word orders and the syntactic relationship between words are considered.
A critical case is a situation characterized by the violation of a certain threshold by the entire country or by a single individual, which would result in a new and uncertain condition in response to which mixed and ambiguous emotional reactions are elicited [9]. The recent coronavirus pandemic  has put much strain on many countries worldwide after its severity was officially acknowledged in early January 2020 [10]. The majority of the inbound and outbound market sources (China, USA, Italy, France, Spain) are severely affected and significantly impact the country's economy. Many countries have strict restrictions on leaving houses, airlines operations and hotel bookings. Most of the mega-events including Expo-2020, initially expected to open in October 2020, have already been delayed because of the safety precautions. Discussions about the impact and expected dates are on social media. Given this, the motivation behind this research is twofold. The first is to evaluate the Twitter data of these two significant events for sentiment analysis and then use the classification algorithm to find out the usefulness of the proposed methodology. Neutral emotions were also analyzed in [10] that are difficult to detect.
Critical events can be categorized into three relevant phases: the early phase of the event, the critical situation itself and the post-crisis recovery [11]. In the early phase, events such as earthquakes, climate changes are detected and are notified promptly by monitoring the Twitter platform. During the core of the crisis, using Twitter, we can obtain information about people's opinions on the critical event's nature that can help improve the calamity recovery system. In the reconstruction phase, Twitter communication can help improve critical events and disasters. Looking at the emotions, more people may work and donate for critical events or help develop mental health to confront the psychological impacts of the crisis. Part of the government's and local officials' attempts to manage the crisis rely on social media, and upon posting a social media request, people may expect a quick response [12].
This paper provides sentiment analysis of Twitter data using a hybrid classification (SVM and BFTAN) methodology. The relationship among words are identified using this classifier. We analyze the performance of this hybrid classifier using two Twitter datasets: the COVID-19 dataset and the Expo2020 dataset. Data extracted from Twitter are preprocessed, and then classified to compute accuracy, recall, and precision. Our hybrid-based approach aims to address the following challenges: improving accuracy, identifying the polarity of comparative sentences, distinguishing the intensity of opinion words, considering negative comments, and handling Sarcasm. Results demonstrate the efficacy of the suggested approach based on the accuracy and class distribution of each dataset. Briefly, the work presented here has four main contributions: a) hybrid classification techniques are thoroughly explored for sentiment analysis, b) a novel hybrid classification approach based on BFTAN is proposed for sentiment analysis, c) a new Twitter dataset related to the recent event (COVID-19) that can be used further in future research, d) it is empirically shown that the hybrid-classification approach can achieve comparable performance in improving accuracy, identifying the polarity of comparative sentences, distinguishing the intensity of opinion words, considering negative words, and handling Sarcasm as well. It also demonstrates that more than 60% of tweets are negative. The remaining part of the paper is: A literature review on the use of classifiers for sentiment analysis is given in Section II. Section III presents the various challenges of sentiment analysis. Section IV outlines the methodology used, data collection and data analysis. Section V describes the study findings and results interpretation, and some limitations and future directions are elaborated in Section VI. Finally, the paper is concluded in section VII.

II. LITERATURE REVIEW
This section provides a brief overview of sentiment analysis, followed by previous approaches used for sentiment analysis of data from Twitter at important events. Sentiment analysis or opinion mining helps assess the polarity of the review (text) as either positive or negative. Different critical classification levels are utilized in sentiment analysis and various sentiment analysis approaches that use classifiers for data from Twitter are proposed. A sentiment analysis method stated in [3] investigates the public sentiments at the Syrian refugee crisis. They gathered English and Turkish tweets using the Twitter package. A total of 2,381,297 tweets were collected for analysis. The sentiment analysis of Turkish tweets showed that the tweets classified into the positive category were higher than that of negative and neutral tweets. In English tweets, the number of neutral categories showed more numbers relative to other categories, with a ratio of 35% of all tweets to 12%. In [13], public opinions extracted from Twitter related to the most challenging international affair -The Iran Deal that took place in July 2015 between Iran and other countries (China, the United Kingdom, the United States, Russia, and the European Union). After analysis, 41% of tweets were classified as positive, the remaining 35% as negative and 24% as neutral. Then, they applied five classification algorithms; Neural Networks, SVM, Decision Trees (DT), Naive Bayes (NB) and K-NN on the tweets. SVM showed more accurate prediction with an accuracy rate of 80.95%.
In [14], a novel approach for Twitter sentiment classification is presented with a non-iterative Deep Random Vectorial Functional Link (D-RVFL) that consists of four hidden layers having the same number of neurons and activation functions at each layer. The effectiveness was tested using Precision, Recall, Accuracy and F1 score and compared with Random Vector Functional Link (RVFL), SVM and Random Forest (RF). It showed that the proposed approach surpasses other methods based on F1 score. An automatic text classification system proposed in [15] classifies tweets according to various general requirements in times of crisis such as food, shelter, water, electricity, medical emergency, collapsed structure and trapped people. The best features were chosen utilizing term frequency and Chi-square. Finally, data classification is performed using SVM and NB. The results showed that SVM outperforms other NB algorithms. Disaster Risk Management Priorities can be determined in which public feedback can be used to make local communities better in times of disaster. In [16], a Bidirectional Recurrent Neural Network (BRNN) analyzes the collected textual data and sequential data. They divided the data corpus in which 85% of the training data and 15% of the testing data were used to construct a BRNN model that achieved an 81.67% accuracy rate. Machine learningbased approach for sentiment analysis combined with features of character n-grams language model is used in [17] for SemEval-2013 Task 2 (Task B) for ''Message Polarity Classification''. This system combined the SVM classifier results with the n-gram language model to make a final prediction. This addressed high lexical variation in Twitter data and achieved a good performance.
In [18], a deep learning sentiment analysis was used in conjunction with ensemble methods to enhance the algorithm performance. Seven public datasets collected from microblogging and movies reviews domain were used to test the performance of this approach where it outperformed other baseline approaches based on F1 score. In this work, performance was enhanced by taking advantage of the existing ensemble classifiers and combining them with sentiment-trained word embedded manually developed features. The strategy suggested in [19] to encode the subject information through word embedding was used for Twitter sentiment classification. In this approach, the topic information of tweets was generated using Latent Dirichlet Allocation (LDA). Then a topic-enhanced recursive autoencoder used to encode the topic information and used SVM classifier to evaluate word representation. Finally, a topic-enhanced word embedding model was integrated with the traditional model for greater accuracy. By using topic-enhanced word embedding as a feature, this method gave a 78.57% macro-F measure to predict the positive/negative polarity of the tweets. The integrated model gave an improved 81.02% macro F-measurement.
A novel sentiment analysis approach was presented in [20] to detect seven sentiment classes on micro-posts such as tweets at the time of situational awareness. Different machine learning algorithms were compared and various features used to detect micro-posts were analyzed. The results of the classifiers were analyzed to detect highly-crisis-relevant information. A sentiment value of tweets in times of disaster was accurately detected in [21]. They used several methods to test the flexibility of the proposed system. Four different methods such as SentiWordNet, Emoticons, AFNN and Bayesian Networks were compared. Bayesian Networks with Senti-WordNet provided the best precision and recall and improved sentiment detection. An algorithm was proposed in [22] to monitor real-time tweets and detect target events, such as earth-quakes, where each Twitter user is regarded as a sensor. They first applied a semantic analysis to extract tweets on the target event. Then the NB algorithm was applied to the training data to obtain a probabilistic model that can detect the target event.
In [23], a hybrid approach was used in which a lexicon/supervised learning approach, and two supervised machine-learning approaches were evaluated in multi-way sentiment analysis. Information from the lexicon was paired with the information from the NB Classifier with a set of features. Supervised learning methods used in this approach are NB classifier with feature selection and MCST classifier based on hierarchical SVMs considering label similarity. The results showed that the mix of lexicon and corpus-based information is superior to other state-of-the-art systems. A work presented in [7] used Bayesian Network Classifiers for sentiment analysis on two Spanish datasets: the Chilean earthquake (2010) and the Catalan independence referendum (2017). Five different classifiers are presented with one being the variant of Tree Augmented Naive Bayes (TAN) model. SVM achieved 80% accuracy rate on both datasets with respect to the accuracy of the models. Both these models find it hard to perceive the relationship between features at the time of classification and the drawback is dealt using the Bayesian Network Classifier. A downside of using the suggested method is that it relies heavily on event and time.
In [24], a hybrid model is proposed by using Gated Recurrent Unit (GRU) and Symbiotic Organisms Search (SOS), named Symbiotic Gated Recurrent Unit (SGRU). Text mining and Natural language processing techniques are used to pre-process the construction site accidents data. This proposed approach is evaluated using other classification techniques based on average weighted F1-score. A hybrid model is presented in [25] to detect epileptic seizure using GA and Particle Swarm Optimization (PSO) to optimize parameters for SVM. Experimental results shows that the performance of the PSO-SVM and GA-SVM are comparable and they both outperform simple SVM. In [26], a deep transfer learning model is presented to detect people who are not wearing VOLUME 9, 2021 face masks with the help of surveillance cameras to control the transmission of COVID-19. A deep transfer learning model known as Resnet50 combined with machine learning techniques for classification using decision trees, SVM and ensemble methods. Comparison techniques are performed to find the most suitable approach that will result in highest accuracy and less time.
In [27], US Presidential Elections-2012 and Karnataka State Elections 2013 data set were analyzed using SVM with PCA. PCA helps in reducing dimensions and achieving better accuracy. However, it is not too much efficient and does not provide consistent output. It showed an accuracy of 88%. In [28], sentiment analysis of microblogging services using Naive Bayes, Maximum Entropy, and SVM were analyzed. It achieves higher accuracy by doing preprocessing steps. But emoticons and other language tweets are not considered in this approach. The method showed an accuracy of 80%.
In [29], Sina microblog data is analyzed using SVM. The method saves more time and shows good performance. In [30], manually labeled data of 300 positive, negative and neutral tweets were analyzed using SVM. Prediction accuracy is good as compared to keyword-based approach. But it does not handle dialects or domain specific issues. The method showed an accuracy of 86.89%. In [31], the model analyzed 9,195 tweets and 2,181 hashtags collected in one week period using SVM. The result showed an accuracy of 84.13%. However, the classification of hashtags are not employed. In [32], NB, RF and SVM approaches are used to monitor consumer opinion about their products. The work, however, didn't consider neutral tweets. Internet movie database is considered in [33]. They used NB and obtained an improved accuracy rate of 81.42%. At the same time, French movie reviews were analyzed in [34]. They used SVM and showed an improved classification performance with an accuracy of 93.25%. Three different movie review datasets were analyzed in [35] using NB and SVM, and achieved impressive accuracy levels.
Chinese sentiment corpus with a size of 1021 documents were analyzed in [36] using centroid classifier, K-nearest neighbor, winnow classifier, Naive Bayes and SVM. SVM exhibits the best performance for sentiment classification and reported an information gain of 0.90 for SVM. In [37], microblogging data were analyzed using Naive Bayes, RF and SVM. Naive Bayes showed consistently accurate results. 2000 movie reviews were analyzed in [38] using NB and SVM. The method is language independent, however computationally expensive. The results showed an accuracy of 86.35%. Table 1 represents the comparison of most common methods in machine learning for sentiment analysis.

III. CHALLENGES IN SENTIMENT ANALYSIS
After conducting a literature review in the previous section, it can be concluded that Sentiment analysis is a difficult task and many challenges can be considered when doing sentiment analysis. However, a few of the challenges of sentiment analysis are highlighted below: • Applying sentiment analysis to non-English languages is a challenging task as most of the available online sentiment dictionaries are in English, so translation problems may occur and impact the system's output.
• Identifying the context and sense of the word is very challenging. For example, the term 'small' refers to the size, is sometimes considered a negative adjective for cars and a positive adjective for computer devices.
• Identifying the polarity of comparative sentences is very difficult to calculate. For example, ''i7 core is faster than i5 core'', which indicates that the word faster is associated with i7 core.
• Mostly negation words are not appropriately handled while doing sentiment analysis that may be the reason for incorrect results. For example, ''the weather is not good today'' contains positive polarity for the word ''good''. Still, due to the negative word ''not'', the sentence's meaning is changed completely.
• Distinguishing the intensity and strength of the opinion word (positive vs extremely positive) is also a challenging task.
• Developing algorithms and techniques that help improve the system's accuracy is also a challenging task as there is no clear way to identify such algorithms.
• Real-time opinion mining and real-time data collection are required as colossal growth can be seen in social networking websites such as Facebook, Twitter etc. it is crucial to have an automated system.
• Identifying Sarcasm is also a challenging task. Some writers include ironic comments to form a positive meaning of the sentence. In contrast, the intended meaning is negative or vice versa.
Different techniques addressed different challenges of sentiment analysis. Some techniques focused on increasing the accuracy of the system using hybrid approach but didn't address other challenges of intensity, comparative words, negation and Sarcasm etc. A limited work is done to handle Sarcasm in text. A new approach is required that can solve all these challenges in real time. The literature on hybrid approach is very limited and is mostly developed to solve the limitations of other previous approaches. In this paper, we use two supervised machine learning techniques to create a hybrid classification framework in a novel manner for twitter sentiment analysis. The proposed hybrid approach used available resources, a set of rules, a supervised learning model SVM combined with Bayes Factor Tree Augmented Naive Bayes technique to accurately classify the input tweet while keeping in mind these different challenges of sentiment analysis. Although, we also investigated traditional methods for designing a better approach for twitter sentiment analysis, yet these techniques are popular among research community and also have been investigated in recent times during the year 2020 [7]. Novelty of our proposed approach stems from the fact that a) BFTAN is combined first time with other supervised approach for twitter sentiment analysis b) A combination of classification techniques achieves comparable performance to the widely studied classification techniques. The performance of the proposed approach is compared with other classifiers and these algorithms are evaluated in terms of accuracy, precision and recall.
Our hybrid-based approach is proposed with the aim of addressing these following challenges: improving accuracy, identifying polarity of comparative sentences, distinguishing the intensity of opinion words, considering negative words, and handling Sarcasm as well. A hybrid approach that uses supervised learning model SVM combined with Bayes Factor Tree Augmented Naive Bayes (BFTAN) technique, called SVM-BFTAN, is proposed. The proposed SVM-BFTAN approach classifies the input tweets into four phases: (i) collecting data, (ii) pre-processing of the tweets, (iii) feature extraction, (iv) proposed hybrid classification approach. The flow chart of the proposed approach is presented in Figure 1. The details of each of these phases are discussed in the following section.

IV. PROPOSED SVM-BFTAN METHOD A. DATA COLLECTION
The efficiency of the proposed approach is measured using two sets of twitter data-The COVID-19 and the Expo2020 datasets. The tweets are collected to measure the user positivity and negativity ratio/response on these two events. The dataset is collected by implementing the following steps: • Twitter Search Strategy: All tweets were extracted and collected using twitter search strategy.
• Hashtags selection: the hashtags that are related to the chosen events are listed in Table 2.
• Tweets collection: Tweets are collected using the listed hashtags. The collected tweets are directly made available for further processing. The dataset structure is shown in Table 2.

B. PRE-PROCESSING
A tweet can have different views expressed by people in a variety of ways. The input data obtained from twitter is redundant and inconsistent, so pre-processing is the fundamental task. This step is implemented over the collected tweets to remove redundant data, reshape them and make it available  for the feature extraction step. The main objective of this step is to remove un-wanted parts, URLS, hashtags, punctuations, numbers, stop words and to highlight only significant parts that will make the further processing easy and accurate. In our approach, pre-processing is done in two phases and are implemented sequentially i.e., the output of each step is taken as an input of another step. Phase 1: This phase is used to eliminate the unwanted noise/unwanted elements from twitter dataset such as: • Eliminate all URLS, email addresses etc. using regular expression matching • Remove all hastags (#) and the word that follows it.
• Remove twitter terms that start with (@) symbol that is used to tag some entity.
• Remove all symbols, parenthesis, backward slashes, forward slashes, numbers, punctuations and dashes from tweets using regular expression matching.
• Substitute a single white space for multiple white spaces.
• Remove all non-English letters using regular expression matching.
• Detection and separation of emoticons Phase 2: Two dictionaries, stop words and acronym are used to enhance the accuracy and precision of the twitter dataset processed in phase 1. The following steps are involved in this phase: • All tweets are changed to lower case for data uniformity. • Common stop words like a, the, is, etc. are omitted by comparing with the stop word dictionary.
• Duplicate tweets are removed from the same user ID.
• Negation words are detected and separated.
As an example, each step of pre-processing is illustrated in Table 3.

C. FEATURE EXTRACTION
From the pre-processed information, the proposed framework recognizes and separates the feature sets. The feature set is portrayed as a part or attribute of an object for which the planned assessment is to be performed. These feature sets are utilized to characterize and classify the data. The involved features address practically every aspect of the tweets, to address the objective targets of negative words, sarcasm detection and improving accuracy. Our proposed system make use of two feature extraction techniques-word2vec and sentiment score, that are used to get better results and reduce the effect of real neutrals [39]. A set of parameters  that are considered while constructing the model is identified and described in the Table 4.

1) FEATURE EXTRACTION WITH THE HELP OF SENTIMENT SCORE
We took a list of positive and negative English opinion words or sentiment words and each tweets were labeled with negative or positive, based on the opinion and the sentiment words. First, we calculate the sentiment polarity [40] of all tweets using (1), This equation is used to get the direction and strength of the sentiment [41], where positive represents positive word count and negative represents negative word count in a tweet. Sentiment class C is represented by two-discrete values i.e., This variable C is used in capturing sentiment values and the distances between them. Sometimes, we cannot decide the emotionality degree of the tweet because the polarity measure fails to do so (Sentiment Score=0) or the positives and negatives cancel each other out. In this case, we cannot decide whether the tweet is positive, negative or neutral. Hence, we use the following definition [7]: After the polarity of the tweet is calculated using (3) and according to the values of the involved features, the tweet is classified into one of the following categories and their respective scores are presented in Table 5: • If polarity is greater than or equal to 0.1, then the tweet is classified as Very Positive.
• If polarity is less than 0.1 and greater than 0.0, then the tweet is classified as Positive.
• If polarity is equal to 0.0, then the tweet is classified as Neutral.
• If polarity is less than 0.0 and greater than −0.1, then the tweet is classified as Negative.
• If polarity is less than or equal to −0.1, then the tweet is classified as Very Negative.

2) FEATURE EXTRACTION WITH THE HELP OF Word2Vec
Word2vec is an unsupervised learning algorithm. The main idea of Word2Vec is to supply the word sequence with weights of each word, making the embedding/ input layer a vector representation of the texts [42]. Word2Vec is one of the most popular technique to learn word embeddings using shallow neural network. Word embedding is one of the most popular representation of document vocabulary. It is capable of capturing context of a word in a document, semantic and syntactic similarity, relation with other words, etc. We have used the CBOW (continuous bag of words), where the model predicts the word under consideration given context words within specific window. The number of dimensions in which the current word must be expressed at the output layer is defined in the hidden layer. By traversing the dataset, Word2Vec vectors are created for each examination in the data set. We obtain the word embedding vectors for each word in the review by simply applying the model to that word. We will apply an average over all the vectors of words in a sentence to represent a sentence from our dataset. These Word2Vec vectors are then used for classification in the next phase. We used a simple equation to allocate polarity to the words in the vocabulary, with a single positive seed word and a single negative seed word as in [43], as follows: where P is the positive seed word for the domain represented by its corresponding word vector w and N is the negative seed word for its word vector. sim represents cosine distance between word vectors. The polarity is positive if the value is greater or equal to zero, and negative otherwise.

D. PROPOSED HYBRID CLASSIFICATION APPROACH
In our proposed hybrid approach, we use different classifiers that are used to deal with different problems so that the accuracy of the system can be improved. The proposed approach is based on hybrid-based approach of SVM and Bayes Factor Tree Augmented Naive Bayes (SVM & BFTAN). Support Vector Machine is a supervised learning model that finds the hyper-plane with the [44]. SVM is also referred to as VOLUME 9, 2021 constrained optimization problem, where the class cj [1, −1] represents positive and negative training samples of document dj and the solution represented by the given vector w, where j's can be found by solving a dual optimization problem, the documents dj that exceed zero are support vectors, and are the sole vectors that contribute to the vector w. Classification of these instances consists of finding the side of a hyper-plane on which they fall. It performs well with a small dataset and provides a non-linear solution with a kernel function to map the input variables into high dimensional space. The aim of SVM is to split the datasets into classes in order to find the maximum marginal plane (MMH). It works well with high dimensional space and offers good accuracy while not working well with overlapping classes. Linear splines kernel is used to deal with data vectors. Input data is normalized so that features are on same scale and compatible. [0.1, 1000] and γ [0.001, 1].) Bayesian networks graphical model takes the joint probability distribution of a set of discrete random variables and encode it [45]. Given an input data point, Bayesian network can be combined with Bayesian theorem to acquire posterior probability of class variable C. Given a set of n discrete random variables [X 1 , X 2 , . . . , N and c i ∈ 1, . . . , k. An incoming data point [g 1 , g 2 , . . . , g n ] is classified as: C = arg c maxP(C = c|X 1 = g 1 , . . . , X n = g n ). (6) Using the Bayesian theorem, posterior probability can be obtained as: . . , X n = g n ) ∝ P(C = c). P(X 1 = g 1 , . . . , X n = g n |C = c), (7) The first term on the r.h.s of the equation (7) is called a priori probability and the second term on the r.h.s of the equation (7) is known as likelihood. This joint probability is hard to calculate, so other strategies are also used, the simplest of which is the Naive Bayes, which considers strong assumptions of independence between features. In other words, the Naive Bayes classifier presumes that the inclusion or exclusion of any distinct attribute in a class is not related to the inclusion or exclusion of any other attribute. Naive Bayes classifier is represented as: (8) Here π i represents the set of parent nodes of X i where π i = [C]. Another alternative model in which each node is allowed to have a parent node along with its class variable node is named as Tree augmented naive bayes (TAN). Conditional independence assumption is given in equation (3). In some cases there are less information supporting the edges in a tree that will have an impact on the generalization power, so to overcome this a mid-way structure is defined among TAN and NB [7]. Provided the decomposability of the Bayesian network and the Bayesian model selection factor, the metric h includes the impact of putting more edge (from X q to X p ) to the NB classifier, where the negative value of h shows that there are sufficient data to support an extra edge. (9) The cumulative value of H e is used to show if there are sufficient data to support e edges compared to 0 edges in naive bayes, which can be represented as: where h i indicates the h value for the ith edge and edges are continuously added until the condition H e < 0 holds. If e = n − 1, it implies the presence of enough information supporting the tree structure and the subsequent structure is a TAN classifier, and if the condition does not hold, then the structure obtained is a forest.

E. SARCASM DETECTION
People do not always express their opinions in the same way. Opinion of each person or individual is diverse in light of the perspective. A few individuals express it in sarcastic manner that is by all accounts positive at the point when perused yet are really not. Comments made by them may mean something contrary to what they have said. They are generally made to offend someone or might be to criticize something in a comical manner. Example: My flight is delayed. . . that's amazing. . . ! [35]. The detection of these kinds of words are very difficult to predict so we used a set of classifiers and the best one if used for testing and functioning of the model. The sarcasm detection in the proposed model is performed using classifiers such as Decision Tree, Random Forest, Gradient Boosting, Adaptive Boosting, Logistic Regression, and Gaussian Naive Bayes. The steps for sarcasm extraction are as follows: • Classification of the prepared data using five classifiers.
• Choosing the classifier with best accuracy. • Training the model with best classifier. • Testing the model with best classifier.
• Real time testing of tweets.

V. EXPERIMENTAL RESULTS
A sentiment analysis was performed using the proposed approach, using two Twitter datasets. A brief description of the considered datasets is depicted in Table 2. 1) COVID-19 dataset The dataset from the Twitter is connected between Twitter API and Rapidminer. Then we use the ''Search Twitter'' operator to search recent tweets on the topic. 120 K tweets are collected between 11-05-2020 to 03-10-2020 and #coronavirus, #Covid-19, #corona, #virus are used as the keywords. The raw data is pre-processed to indicate positive and negative tweets based on sentiment and opinion words present in the tweet. 2) Expo-2020 dataset The dataset from the Twitter is connected between Twitter API and Rapidminer. We use ''search twitter'' operator to search tweets specific to the ''United Arab Emirates''. 5000 tweets are collected between 14-05-2020 to 16-05-2020 using #Expo2020, #delayExpo, #postponedexpo2020 and others as keywords. The raw data is then pre-processed to indicate positive and negative tweets based on sentiment and opinion words present in the tweet. The feature vectors gathered from the prior step is given as an input to the proposed approach (SVM-BFTAN) i.e., a classification module that uses SVM and BFTAN to classify the data. We randomly sampled our previously described datasets into train and test data. We use 70% of the dataset for generating the training set and 30% for the test set. We first train our classifier on training dataset and then the confusion matrix is computed using the test set such as: • True Positives (TP): Number of positive tweets that are predicted correctly.
• True Negatives (TN): Number of negative tweets that are predicted correctly.
• False Positive (FP): Number of positive tweets that are predicted incorrectly.
• False Negative (FN): Number of negative tweets that are predicted incorrectly. Subsequently, the following performance measures are computed: The splitting procedure is executed 7 to 10 times for both datasets; 70% of the training samples and 30% of the test samples. For each set, the classification performance is measured on test set and then the average of each measurement is reported. Class imbalance problem occurs when the class sizes are different. Mostly, any one of the classifiers will be TABLE 6. Comparison of the proposed method with the existing methods in terms of accuracy, precision and recall using dataset 1.

TABLE 7.
Comparison of the proposed method with the existing methods in terms of accuracy, precision and recall using dataset 2. biased towards the majority class. Class distribution of the datasets shows that the 60% of the tweets are negative.
Performances of five classification algorithms-BFTAN, TAN, NB, SVM and RF are compared. The results shown in Table 6 indicate the comparative results of the suggested approach and the existing approaches for the above three parameters with sentiment score feature extraction using dataset 1. The results of our empirical study suggest that the accuracy of the proposed approach is comparable to other classifiers. Table 7 displays the comparative results of the suggested approach and the existing approaches for the above three parameters with sentiment score feature extraction using dataset 2. The results demonstrate that SVMBFTAN has shown 90.84%, 91.22% and 90.08% accuracy, precision and recall values. The recall value of SVMBFTAN is the highest compared to the other classifiers while RF showed the highest accuracy values followed by SVMBFTAN on dataset 2. These results suggest that improved performance can be achieved by combining different techniques in a systematic manner.
The results shown in Table 8 indicate the comparative results of the proposed approach and the existing approaches  for the above three parameters with Word2Vec feature extraction using dataset 1 and dataset 2. The results demonstrate that the proposed classifier has the highest accuracy, precision and recall values. It shows that the comparable performance can be achieved by combining two supervised classification techniques. SVM outperforms other techniques in terms of time efficiency. As expected, our hybrid classification approach is very time consuming (mean time across all features ∼ = 980 sec) as it combines two individual techniques. Hence the total time elapsed by each technique adds up in hybrid classification.
We compare five classification algorithms on two different events with Ruz et al. [7] results. We use the same approach of sentiment analysis [7] with the same size of data but on two different events to see the performance of the algorithm. Comparison results are shown in Figure 2. The results show that our values are less accurate compared to the benchmark paper on dataset 1. Figures 3 and 4 show the precision and recall values of the five classifiers on dataset 1. Our values are slightly less compared to the benchmark paper on COVID-19 dataset. Figure 5 shows the accuracy values of the five classifiers on dataset 2. The results show that our values are more accurate compared to the benchmark paper on dataset 2. Figures 6 and 7 show the precision and recall values of five   classifiers on dataset 2. Our results are better compared to the benchmark mark paper on the Expo2020 dataset.
By looking at the results, we can see that the performance of the five classifiers on dataset 1 is not the best compared to the Ruz et al. approach but the performance is better on dataset 2, so we use an ensemble of Bayesian Boosting (BB) on the classifiers to improve our results on Covid-19 dataset. Table 9 shows the improved version of the five classifiers after adding the ensemble of BB, in terms of accuracy,      Figure 10 reveals some irregular pattern in which the SVM ensemble with BB shows a marginally lower recall value relative to the Ruz process.

VI. DISCUSSIONS AND FUTURE DIRECTION
In this section, we present some interesting observations related to the strengths and limitations of the proposed  approach arose during this study which are worth discussing. Our results suggest that SVMBFTAN is the only classifier that shows an above 80% accuracy values on both the datasets. Our proposed hybrid technique offers better quality classification than the other traditional techniques. In dataset 1, there is enough data to support SVMBFTAN model therefore, the accuracy values of SVMBFTAN are better compared to the BFTAN and the other Naive Bayes classifiers, while in dataset 2 there were not many examples to promote SVMBFTAN model. In this case, the generalization performance of SVMBFTAN is competitive with RF. RF offers better time efficiency. If output quality and time efficiency are addressed together, this raises the question as how to tradeoff between these two. If one approach gives better quality, then its time efficiency is compromised so it's very challenging which should be preferred more. As a suggestion, output quality should be considered more as compared to its time consumption because at the end it's only the quality that matters the most and our approach gives better quality results. However, it might be interesting to see how the performance of our proposed approach could change in terms of solution quality and time by using many other techniques in different manner or using high-performance computing (HPC) devices for better results. The comparison of our approach to the traditional methods in social sciences (in-depth interviews, focus groups, questionnaires) and the big data analysis of social media including Twitter, shows us a broader and a more dynamic image of critical events. However, the difficulties in coding and classifying the twitter data may remain in the future as well. During critical events, the number of tweets and re-tweets are increasing day by day, so it is challenging to identify relevant and meaningful communication patterns. In some cases when the twitter information does not match the ground information obtained through technical devices from natural locations, it may lead to wrong decisions by the authorities. Several challenges need to be resolved in the future, such as approaches to minimize error while collecting information on critical events and introducing new techniques that are time efficient and improving geo-location techniques. As our analysis shows, the classification of Twitter data has improved our communication dynamics in critical events. A statistical debate on the representativeness and relevance of Twitter data has been an important research subject over the past few years. In general, there are two questions: if these Twittered reflect the real society or if these Twitter samples portray Twitter communications. Our focus in this study is on the second question. Twitter data is accessed in the following two ways. The most common of them is the API and the Streaming API that is partitioned into two sources: Filter API (in which we can search using parameters, i.e., keywords, hashtags, user accounts and geographic locations) and Sample API (that provides one percentage of all tweets with no parameters). The next source is the Representational State Transfer or REST API that gives timelines of specific users that is confined to 3200 recent tweets. Twitter do not share information on its sampling techniques, so it is challenging to determine if the one percentage of Twitter data that is public is a true reflection of twitter communication or not. This paper focusses to use a hybrid classifier for sentiment analysis during critical events. The performance of this methodology is largely dependent on the quality of the training examples. In some cases, there may be a chance of biased or noisy data. In this case, the sentiment polarity of hashtags is used as a feature in the classification process [46]. The number of positive and negative hashtags is taken as the input feature in this case.
For the future work, we explore hashtag classification at initial level (hashtag-level sentiment classification) for COVID-19 dataset such as the recent conflict in October 2019 [41], [46], or in the case of a natural disaster like an earthquake. Hashtags related to positive feelings as well as negative impacts, could be analyzed. In the following stage, tweet sentiment analysis may be done. More approaches to integrating hashtag-level sentiment classification with tweet-level sentiment classification to achieve more reliable and robust findings can be discussed. The suggested approach to sentiment analysis is time-consuming and case dependent, which means that the relevance and predictive capability are limited to the brief term or short time period of the event. For example, if we train our classifier for a potential case of the same nature, new hashtags will arise that can create complications, when people need to use additional terms to express their thoughts and emotions. The classifier not trained according to the new event makes its use limited. However, the design has a network architecture that is interlinked with words and how they classify the event, ensuring the conceptual validity of the model. If the predictive model changes unexpectedly, either slowly or rapidly for events in the future in a setting which was initially modeled, an online learning approach is adopted. Alternatively, an unsupervised learning approach can be used on new event and words. We can subsequently couple this unsupervised network with a fine-tuned version of the original Bayesian network to obtain the words that appear in both the events. Techniques to handle missing data and its flexibility makes machine learning an up-and-coming area for sentiment analysis in critical situations. In future, other qualitative approaches such as grounded theory, can be combined with the proposed classifier to obtain better results. This may inspire researchers and scientists to prioritize machine learning techniques in their studies and to recognize Twitter as a good platform for reflecting emotions and social facts.

VII. CONCLUSION
This paper introduced a novel hybrid classification approach to analyze the feelings of tweets using the SVM and BFTAN methods. The method was tested on two Twitter datasets, the COVID-19 and the Expo2020 datasets. It classifies the input tweets into four phases: (i) collecting data, (ii) preprocessing of the tweets, (iii) feature extraction, (iv) proposed hybrid classification approach. Our hybrid-based approach is proposed to address the following challenges: improving accuracy, identifying the polarity of comparative sentences, distinguishing the intensity of opinion words, considering negative comments, and handling Sarcasm. Results demonstrate the efficacy of the suggested approach based on the accuracy and class distribution of each dataset. The approach was compared with other classifiers -BFTAN, TAN, NB, SVM and RF. Accuracy, precision and recall were computed for all the considered datasets. When enough data is available to support training examples, Bayesian classifier shows effective performance. Dataset 1 provides sufficient data to support training samples, so in this case, SVMBFTAN shows more accurate results compared to others. In contrast, in dataset 2, RF shows competitive accuracy values. Our method demonstrates higher accuracy values compared to previous methods, although an improvement in time is desired. However it might be interesting to see how the performance of our proposed approach could change for both in terms of solution quality and time by using many other techniques in different manner or using high-performance computing (HPC) devices for better results. Future work will therefore explore more accurate and flexible techniques by introducing some feature selection methods, preprocessing techniques for data and dealing with the hashtag-level sentiment analysis for the classification of tweets.