Inquest of Current Situation in Afghanistan Under Taliban Rule Using Sentiment Analysis and Volume Analysis

Microblogging websites and social media platforms serve as a potential source for mining public opinions and sentiments on a variety of subjects including the prevailing situations in war-afflicted countries. In particular, Twitter has a large number of geotagged tweets that make the analysis of sentiments across time and space possible. This study performs volume analysis and sentiment analysis using LDA (Latent Dirichlet Allocation) and text mining over two datasets collected for different periods. To increase the adequacy and efficacy of the sentiment analysis, a hybrid feature engineering approach is proposed that elevates the performance of machine learning models. Geotagged tweets are used for volume analysis indicating that the highest number of tweets is originated from India, the US, the UK, Pakistan, and Afghanistan. Analysis of positive and negative tweets reveals that negative tweets are mostly originated from India and the US. On the contrary, positive tweets belong to Pakistan and Afghanistan. LDA is used for topic modeling on two datasets containing tweets about the current situation after the Taliban take control of Afghanistan. Topics extracted through LDA suggest that majority of the Afghanistan people seem satisfied with the Taliban’s takeover while the topics from negative tweets reveal that issues discussed in negative tweets are related to the US concerns in Afghanistan. Sentiment analysis over two different datasets indicates that the trend of the sentiments has been shifted positively over three weeks.


I. INTRODUCTION
T HE current situation in Afghanistan is crucial after President Joe Biden announced the end of the war in Afghanistan and the withdrawal of allied military forces [1]. Taliban, which continued to capture and contest territory across the country despite ongoing peace talks with the Afghan government, ramped up attacks on ANDSF (Afghan National Defense and Security Forces) bases and outposts, began to rapidly seize more territory [2]. The U.S withdraw its army in August 2021 and the Taliban take control of 90% of Afghanistan [1]. On 15th August Taliban entered and captured Kabul, the capital of Afghanistan, and take its full control. After this victory, they announced to make a new government comprising representatives from all Taliban groups. Although sooner or later, Taliban takeover was inevitable, the world is surprised at the pace ANDSF lost control of major cities. Several speculations and theories have been discussed over the news and social media over the past several days [3], [4]. In addition, different conjectures and contemplations are made by human rights activists and organizations about the change in social norms especially regarding the status of women in Taliban-controlled Afghanistan. Different opinions, reviews, and impressions are posted online, both by common people, as well as, gov-ernments, politicians, and religious leaders about the current situation in Afghanistan. Such reviews represent significant potential to analyze the thought of people about and draw current situation in Afghanistan. Giving a mixed attitude with positive and negative comments, opinions of Afghanistan people are also very important, both living in Afghanistan and abroad.
Social networks have become an inevitable and integrated part of daily lives and hold an inexorable place in daily activities for many people [5]. People feel urged to freely express their opinions and feelings on social media platforms without restrictions of race, color, ethnicity, and location. Especially, Twitter has become one of the most widely used social media platforms where people express their thoughts, opinions on different topics, ideas, and personalities [6]. Such opinions and thoughts are posted in the form of tweets which are further shared on Facebook and other similar platforms. These tweets are useful to extract people's sentiments on Afghanistan's current situation. Besides personal opinions, such tweets indicate several problems related to the social and cultural situation. Sentiment analysis possesses the ability to analyze these tweets to determine the sentiments of people. In addition, topic modeling can be used to extract the most discussed topics and issues from tweets. Once the sentiments are categorized as positive and negative, the issues and opportunities can be obtained from negative and positive sentiment containing tweets, respectively. Topic modeling is an attractive approach that helps in extracting important topics from the given document (s). It helps in determining the importance of a topic by the frequency of its occurrence and its relation to other topics. This study investigates one of the challenging topics in current international affairs, Taliban rule in Afghanistan after US withdrawal, through the opinions of the public in tweets. In this regard, this study proposes an approach for sentiment analysis using machine learning techniques on Twitter data containing tweets related to Afghanistan. Following contributions are made in the current study • Two datasets are collected containing tweets related to Afghanistan's current situation which are used for sentiment analysis. Datasets are collected for two time spans to perform a prospective analysis. • For initial sentiment analysis, HFE (Hybrid Feature Engineering) approach is devised to increase the accuracy of sentiment analysis. To avoid the labor of data labeling, TextBlob is used for automatic annotation. Several machine learning models are used with TF-IDF (Term Frequency-Inverse Document Frequency), BoW (Bag of Words), and HFE for sentiment analysis. • Topic extraction is performed using LDA (Latent Dirichlet Allocation). The analysis is performed for change in the sentiments over time considering the collected datasets. Extracted topics are analyzed in terms of their importance and sentiments are discussed with respect to geolocation.
The rest of the paper is structured as follows. Section II discusses important research works related to the current study. The proposed research methodology and its related contents are presented in Section III. Section IV provides the discussion of results. In the end, the conclusion and future work are given in Section V and Section VI, respectively.

II. RELATED WORK
During the last few years sentiment analysis has received a wide interest due to the popularity of social media platforms like Twitter, Facebook, etc. Additionally, the availability of a large amount of data in the form of tweets, reviews, and comments accelerated its pace. Consequently, a large body of work exists for sentiment analysis and topic modeling. Study [7] uses statistical approaches to analyze the effectiveness of fake review systems in the sentimental analysis domain. Different machine learning models are evaluated to investigate their accuracy for the task at hand. Fake news is detected based on sentiments like positive and negative reviews via the proposed system. NB (Naive Bayes), KNN (K Nearest Neighbors), K-means Clustering, SVM (Support Vector Machines), and DT-J48 algorithms are evaluated in the study. Results indicate that SVM performs best among the investigated models.
Due to the potential of topic modeling and sentiment analysis to extract important aspects of specific topics, several works use topic modeling approaches such as LDA, LSA (Latent Semantic Analysis), and PLSA (Probabilistic Latent Semantic Analysis), etc. along with the sentiments analysis. For example, [8] uses LDA with VADER (Valence Aware Dictionary and sEntiment Reasoner) to analyze important topics from global climate change tweets. Diverse topics on global climate changes are extracted and analyzed across time and space. The influence of change in policy towards climate change is investigated using LDA and sentiment analysis. Similarly, the authors perform topic modeling and sentiment analysis in [9] using big data analysis to identify the needs of customers that travel using Asian airlines. Owing to the increasing market size of Asian airlines, such analysis aims at increasing customers' satisfaction through meeting their in-flight needs. Online reviews by customers are utilized to analyze several important topics such as in-flight meals, entertainment, seat class, seat comfort, and staff service, etc. Feature extraction and feature selection techniques are also an important part of sentiment analysis. Feature selection plays a prominent role to enhance text classification accuracy and many research works investigate the efficiency of various feature extraction approaches. For example, study [10] proposed a summarization-based approach for feature reduction to improve the model's performance. The selected features are used to train the k nearest neighbor (KNN) classifier. Another study [11] analyzed feature extraction, feature reduction, and feature selection to improve the model's performance for text classification. They also used a KNN classifier which performs well with information gain and mutual information feature selection techniques. Topic modeling and sentiment analysis are important, both from the perspective of individual customers, as well as, analyzing the public opinion about ongoing political conflicts at local and international levels. For example, [12] conducts sentiment analysis of tweets during primary debates during 2016-16 to uncover people's emotions about Donald Trump. Of the tweets which are labeled as positive or negative, joy is found to be the dominant emotion from the tweets that supported Trump. Unlike, sentiment analysis for local governments where the scope is narrow, analysis of tweets about international conflicts and events holds large scope and depicts the thoughts of people worldwide. For example, the scope of tweets about, the US banning Huawei, the Iran-US nuclear deal, and peace talks between North Korea and the US, is much broader than the sentiment analysis of products and services. Several research works can be found in the literature that focuses on such international conflicts and events. For example, Wael et al., [13] use sentiment analysis to carry out bias detection of Palestinian-Israeli conflict in Western media. This process involves finding misleading terms, vocabularies, and expressions used to shape the opinion when disseminating news about the Israel-Palestine war. Similarly, sentiment analysis is performed on tweets about post-conflict in Columbia using a crawled dataset from Twitter in [14]. The analysis is performed using tweets from Colombia residing Colombians and foreigners. Tweets from foreigners are considered to study public perception. Results of the research indicate that foreigners tend to show a more positive attitude and feelings compared to local people.
The authors investigate public opinions about Iran-US nuclear deal in [15]. Through the deal, Iran is restricted to stop its conflicting program, and the US, European countries, and China agreed to lift economic sanctions in exchange. The lexicon-based approach is utilized using machine learning models to analyze the opinions about the deal from people residing in different countries. The majority of the sentiments found in the collected tweets are found to be positive. Similarly, [16] investigates public sentiments about the US banning Huawei through tweets posted during the banned period. Analysis using lexicon-based approach indicates the majority of the people remain neutral about the US-China trade war, followed by those who oppose it. Besides, US and China being the countries involved in the trade war, large volumes of tweets are generated from Canada, the United Kingdom, India, Pakistan, and South Africa. Another work analyzing sentiments about China-Us trade war is [17] which presents a real-time monitoring and reporting framework for public opinions. The data is obtained from 52 economic leaders and the China-Us trade war is used as a case study to study the performance of the proposed framework. Results indicate that CNN (Convolutional Neural Network) tends to produce better results for real-time monitoring.
Topic modeling and sentiment analysis can play a significantly important role to find the prevalent issues and challenges triggered from local and international events and help the authorities to take corrective actions to mitigate their influence. For example, [18] discusses the problem of Syrian refugees by analyzing the tweets in Turkish and English language. Comparative analysis between the tweets indicates that Turkish tweets contain slightly more positive tweets VOLUME 4, 2016 towards Syrian refugees than negative and neutral sentiments. On the other hand, the majority of the English language tweets carry neutral sentiments followed by negative sentiments. This study conducts aims to discover and expose sentiments about the Taliban taking control of Kabul (the majority of Afghanistan) by topic modeling, volume analysis, and sentiment analysis.
The study [19] applied topic modeling and sentiment analysis on Japanese newspaper articles to analyze the data of Middle East countries. The study first finds crucial topics from news articles and then performs sentiment analysis using machine learning on the selected topics. Topic modeling results show that most of the Japanese news articles cover the refugee crisis, Qasem Soleimani killing, and Iran nuclear deal, and Islamic state issues. Sentiment analysis results show that Saudia Arabia and Trump-related topics have highly negative sentiments. In a similar fashion, [20] conducts sentiment analysis on tweets dataset related to heritage destruction by ISIS in the Middle East. The tweets dataset is collected between the years 2015 to 2016. The study provides insight to the international community to protect heritage and tackle terrorism. Another study [21] performs sentiment analysis of ISIS-related tweets to explore people's sentiments on terrorism-related activities. The study used the TF-IDF technique to perform sentiment analysis on ISIS-related tweets.
Since August 2021, the Taliban's taking over Afghanistan is the world's most discussed topic on social media. Mixed reviews have been posted on social media concerning the Taliban started regime including both favoring and against Taliban control. While many people are supporting Taliban control of Afghanistan, others are opposing them regarding concerns for human rights, especially women. The topic is an important, current international affair and requires an investigative outlook as no study has addressed this topic recently. This study fills the gap by analyzing the Afghanistan situation after America's withdrawing forces. Taliban's rapid take over of Afghanistan took the world by surprise, so different sentiments are expressed both by individual leaders, celebrities, and politicians and governments alike. This study explores people's opinions using tweets regarding the Taliban government and tries to find out the change in overall sentiments over time. For this purpose, data visualization, volume analysis, topic modeling, and sentiment analysis are utilized which is missing in previous studies.

A. DATASET DESCRIPTION
The dataset used for experiments in this study is collected from Twitter. For this purpose, the Tweepy library and a developer account are used to extract the tweets [22]. Several different keywords and hashtags are used to search the relevant tweets such as '#taliban', '#saveafganistan', 'war in Afghanistan, etc. The collected tweets are posted by Twitter users between 10 august 2021 to 21 August 2021. Total 21,000 tweets are extracted for sentiment analysis and a few sample tweets are given in Table 1. The tweets are collected along with the geolocation from where it was posted. Figure  1 shows the location distribution of Twitter users for the extracted dataset.
The extracted tweets contain several different words which are common for most of the tweets, e.g., Taliban, Afghanistan, Biden. Figure 2 depicts the word cloud for the whole dataset showing the most commonly found terms used in the extracted tweets related to Afghanistan's current situation.

B. PROPOSED METHODOLOGY
This study performs sentiment analysis on the tweets related to Afghanistan's current situation under Taliban control after the withdrawal of US and allied forces.
Experiments are performed using an Intel Corei7 7th generation machine with Windows 10 operating system. For implementing machine learning and deep learning models, Jupyter notebook and Python language are used. The Scikit library, Keras, TensorFlow, Genism, Textblob, and NLTK libraries are used for experiments. The architecture of the proposed methodology is shown in Figure 3.
The first step after the data extraction is to remove unnecessary, superfluous information from the data that do not contribute to the prediction of the target class. For this purpose, a preprocessing pipeline is adopted to clean the collected data [23], [24]. Following steps are followed in the preprocessing • Hashtags and link removal: Tweets contain tags and links that are not useful for training the models, so they are removed from the dataset to reduce the complexity. Tags and links are removed using regular expressions. • Punctuation and number removal: Punctuation is a necessary part of the sentence to make it more intelligible and meaningful for human readers. However, for machine learning models, punctuation creates extra complication and increase the feature vector size. Keeping in view the fact that punctuation does not contribute to the training process, they are removed from tweets. • Convert to lowercase: Machine learning models are case sensitive so the difference in the case cause complexity in models training as the 'War', 'War', and 'WAR' are considered as different words. It increases the feature space size as well as, the processing time.
Conversion to lower case converts each character to its lower case and Python built-in function is used for that purpose. • Stemming and lemmatization: Stemming converts the words to their root forms such as 'goes', 'going', 'gone' are variations of the 'go'. If not handled properly, machine learning models take them as different words, so stemming is performed to convert them all to their base form 'go' [25]. Although Lemmatization is similar to stemming, it is often more effective. Stemming removes a few characters which may lead to erroneous terms. On the contrary, lemmatization considers the context of the sentence and transforms the words into their proper base form. This study carries out stemming and lemmatization using Porter's stemmer and WordNetLemmatizer. • Stopwords removal: Stopwords are an important part of the sentence to increase readability and meaningfulness. These are short and meaningless words such as the, in, he, it, an, etc., that do not contribute to the training of the machine learning models. Consequently, stopwords are removed to reduce the complexity of feature space.
Results of preprocessing steps applied on sample tweets from the collected dataset are shown in Table 2.

1) TextBlob
After preprocessing of data, the dataset is annotated with the negative, positive and neutral sentiments using the Textblob library [26]. Textblob library finds the polarity score for each tweet to be annotated. Table 3 shows the score values that are used to assign sentiments for tweets.
Textblob is one of the most widely used lexicon-based techniques for sentiment scoring [26]. It finds the polarity score from the given text that ranges between -1 to 1. This study uses the Textblob technique to annotate the tweet dataset into negative, positive, and neutral tweets. These sentiments are used in this study as a target class to train the machine learning models. Textblob is used because it gives more correlated sentiments corresponding to text features. Using the TextBlob, the count of tweets with each score is given in Figure 4. After data annotation, the dataset is split into training and testing datasets for models' training and testing, respectively. Train-test split is 85:15 where 85% of the data are used for training while 15% is used for testing. The number of tweets after data split is given in Table 4.

2) Features Engineering
Once the data annotation is done, feature extraction approaches are applied. Feature extraction implies that we need to transform the text data into a numeric form that can be used to train machine learning models. This study uses three well-known feature extraction techniques as TF-IDF, BoW, and Word2Vec. Additionally, Chisquare (Chi2) is used as a feature selection approach. Bag of Words: BoW is one of the simplest feature extraction techniques used to extract features from text data [24]. It is an easy-to-implement and easy to interpret approach. Despite being simple, it often produces competitive results as compared to complex feature extraction approaches. It counts the frequency of each unique term in the corpus and makes a numerical feature vector, thereby reducing the complexity of the model's training. The Sci-kit library is used to implement the BoW. Term Frequency -Inverse Document Frequency: Originally introduced by Salton, TF-IDF is one of the most widely used feature extraction techniques for text analysis [27]. It gives weighted features as compared to simple count features by BoW. It computes the weight of each term in the corpus and these weighted features are more efficient in the training of models. TF-IDF comprises TF and IDF where the former counts the occurrence of each unique term while the latter assigns weights to each term based on its appearance in different documents. TF can be computed using (1) [28] where t D is the number of times the term t appears in a document D and D t is the number of terms in a document D. IDF can be computed as (2) [28] where N is the number of documents and N d,t is the number of documents containing the term t. So TF-IDF can be computed as Word2Vec: Word2Vec is a feature extraction approach developed by the Google [29]. It is different from BoW and TF-IDF as they use the repetition of the term in a corpus while word2vec utilizes the similarity between words. It uses cosine similarity to find the similarity between the words. It is found to work better on both large and small datasets. Word2Vec is implemented using the Genism library.

3) Hybrid Feature Engineering
HFE is a combination of feature extraction techniques T-IDF, BoW, and Word2Vec using feature selection technique Chi2. Chi2 is a feature selection technique that is used in this study to select the best features [23], once TF-IDF, BoW, and Word2Vec are applied to the preprocessed data. Chi2 selects the best features by finding the independence between the features' variables [30]. The Chi-Square statistics are adjusted by the degree of freedom which varies with the number of levels of variables and the number of levels of the class variable. We used the Chi2 technique in our proposed hybrid feature engineering technique to select the best features. In HFE, first, features are extracted from the text data using each feature extraction technique followed by the selection of 2,000 features from each extracted feature set. These 2000 features from each technique are concatenated to make the new hybrid feature set which has more diversity and less complexity to boost the performance of the machine learning models. Algorithm 1 shows the steps followed in HEF Algorithm 1 Hybrid feature engineering.
where n is the number of tweets and m is the number of features extracted using each technique, i.e., TF-IDF, BoW, and Word2Vec. T F − IDF F , BoW F , and W ord2V ec F show TF-IDF features, BoW features, and Word2Vec features from the tweet dataset, respectively.
Chi2 performs feature selection using the following procedure. The p is the number of selected features by the Chi2 and BF is the best features. We combine these best features to propose a new feature set. HEF (n×q) are the hybrid features used in the proposed approach to obtain higher accuracy. The HFE feature set is small in size as compared to TF-IDF, BoW and Word2Vec features sets and reduces the feature space complexity. The number of features with each technique is shown in Table 5 and the illustration of HFE is shown in Figure 5.

4) Machine Learning Models
This study used five machine learning models to perform the sentiment analysis aiming to select the one with the best performance. The selected models are optimized by fine-tuning several hyperparameters. Value range is used to analyze the performance of a model with a different set of parameters. From the literature review, different text classification models are selected along with their best-performing hyperparameters. For performance optimization, value ranges found in the literature are used on the collected dataset to analyze models' performance. For example, in the case of the LR model, the solver parameter is used with 'saga', 'sag'. Although LR has a third parameter for the solver, i.e., 'liblinear', however, 'liblinear' is not appropriate for multi-class problems. Similarly, for other models, value ranges are selected where possible to find the set of the best parameters to obtain high classification accuracy. For the ETC model, we found that if a feature set is large, the max depth of each decision tree has to be increased to obtain better results, so, we define the range of the 'max_depth' parameter between 50 to 500 to analyze its performance. The selected hyperparameters setting and their corresponding values are given in Table 6 and the tuning range for each hyperparameter is shown in Table 7. A comprehensive description of the machine learning models is provided in Table 8.

IV. RESULTS AND DISCUSSIONS
Performance evaluation of the trained models is realized using accuracy, precision, recall, F1 score, and confusion matrix. Results of models are discussed with respect to each feature extraction technique such as BoW, TF-IDF, Word2Vec, and HFE. The performance comparison between machine learning and deep learning models is conducted as well. Similarly, the performance of the proposed approach is compared with state-of-the-art approaches.

A. MODELS' PERFORMANCE USING BOW FEATURES
Initially, the experiments are performed using BoW features and Table 9 shows the experimental results. Results indicate that tree-based models and linear models perform significantly better in terms of precision, recall, and F1 score. SVM outperforms all models by achieving the highest accuracy, precision, recall, and F1 score of 0.95, 0.94, 0.93, and 0.94, respectably. Performance of LR and RF is marginally lower with 0.93 and 0.92 accuracy scores, respectively. Owing to the fact that the dataset has a large feature set, the performance of the linear models is considerably better than other models. On the other hand, GNB and KNN did not perform well. GNB assumes that data follows a Gaussian distribution and the accuracy is affected when it is not the case with the data distribution under different labels. The confusion matrix in Figure 6 shows the number of correct and wrong predictions by each model using the BoW features. SVM gives 2,981 correct predictions and 169 wrong predictions out of 3,150 total predictions. LR gives 2,922 correct and 228 wrong predictions while RF gives 2,902 correct predictions and 248 wrong predictions. The worst performer GNB gives only 1,423 correct predictions and 1,727 wrong predictions. 6.

B. RESULTS OF MACHINE LEARNING MODELS USING TF-IDF FEATURES
Experimental results for machine learning models using TF-IDF features are provided in Table 10. Results suggest that SVM performs the best even when used with TF-IDF features and obtains scores of 0.93, 0.93, 0.90, and 0.91 for accuracy, precision, recall, and F1 score, followed by ETC with a 0.92 accuracy. GBN shows poor performance with TF-IDF features as well. Furthermore, the performance of the machine learning models is degraded when used with TF-IDF features. These results conform with [35] which shows that preprocessed data using BoW yields better classification VOLUME 4, 2016

Model
Description LR LR is a linear model widely used for classification and makes use of the logistic function to separate the data and is named logistic regression [31]. This logistic function is also known as the Sigmoid function. It is an S-shaped function and can map real values between 0 and 1 space. The logistic function can be defined as: where e is the Euler's number and a is the actual numerical value that is to be transformed. LR equation can be defined as y = e (y 0 +y 1 * x) (1+e (y 0 +y 1 * x)) where y is the predicted output, y 0 is the bias or intercept term and y 1 is the coefficient for the single input value (x). LR can be a good for sentiment analysis and performs better with large feature space. ETC ETC is a tree-based ensemble model used for text classification. Similar to the random forest, it predicts using the majority voting criteria. It generates the number of decision trees on the original dataset and performs voting between decision trees predictions to make the final prediction [25]. Mathematically ETC can be defined as ET C = mod sum N i=1 t i where t i belongs to t 1 , t 2 , ..., tn which are trees in ETC, and M is the number of decision trees. ETC can be a good choice for the current approach because it can perform better even when the dataset is not linearly separable. SVM SVM is a linear model widely used for data classification. It utilizes hyperplanes to classify data. In feature space, SVM draws the hyperplanes with a good margin from each class to give significant results [32]. It performs well on both multi-class and binary class classification. SVM can be used with different kernels, and this study utilizes the linear kernel which makes it more efficient to perform sentiment analysis. SVM tends to show better performance for text classification tasks when the data has a large feature space. GNB GNB is a type of Naïve Bayes model that follows Gaussian normal distribution and performs well on the continuous data [33]. Naive Bayes models are based on the Bayes theorem used for the classification of data. GNB can be a good choice when working with continuous data, an assumption often taken is that the continuous values associated with each class are distributed according to a normal (or Gaussian) distribution. KNN KNN also known as a lazy learner is used for classification and regression [34]. In KNN, the K indicates the number of nearest neighbors considered to make a decision. It is very simple and easy to interpret as compared to other machine learning models and often shows competitive results. During training, it finds the similarity between features based on their distance with neighbors. During prediction, it compares the test data with training data and predicts the target class using a distance measure such as Euclidean, Manhattan, etc.   Figure 7 shows the confusion matrix for machine learning models using the TF-IDF features. SVM gives 2,915 correct predication and 235 wrong predictions using the TF-IDF features. ETC model is the second-best performer and predicts 2,884 correctly while 266 predictions are wrong. These statistics show that the model's performance on BoW features is more significant as compared to the model's performance with TF-IDF features. Primarily, the features using the BoW technique are simple as compared to features using TF-IDF, and have a high probability of existence in the tweets. On the other hand, TF-IDF assigns higher weights to rare terms, and such terms are not common in tweets which makes the feature set complex. So models get less complexity on BoW features and perform better.

C. MODELS' PERFORMANCE USING WORD2VEC FEATURES
The performance of models using the word2vec features is shown in Table 11. Results reveal that the classification performance is substantially degraded when models are trained on Word2Vec features. For example, the accuracy of the best performing model SVM is 0.72 which was 0.93 with BoW  and 0.95 with TF-IDF. Similarly, the accuracy of LR has been reduced to 0.71 when used with Word2Vec from 0.90 with TF-IDF. On the other hand, the performance of GNB has been elevated when used with Word2Vec features and it reaches to 0.52 accuracy score which was 0.45 and 0.42 with BoW and TF-IDF, respectively. Word2Vec maps word embedding into feature space which becomes complex as the number of interrelated used terms increases. This feature complexity often tends to reduce the performance of machine learning models. The confusion matrix for the machine learning models using the word2vec features is shown in Figure 9. The performance of models is low using the word2vec features as compared to TF-IDF and BoW and that is the reason the wrong prediction ratio is higher. SVM gives 2,264 correct predictions and 886 wrong predictions while LR, ETC give 2,244 and 2,058 correction predictions, respectively.

D. PERFORMANCE OF MODELS USING HYBRID FEATURES
The performance of models using the proposed HFE features is shown in Table 12. The performance of the models is significantly better with the proposed features as compared to BoW, TF-IDF, and Word2Vec. SVM outperforms all models with the highest 0.97 accuracy while the precision, recall, and F scores for SVM are 0.96 each. Followed by SVM, the performance of LR is marginally low with a 0.96 accuracy score. The values for F1 scores and accuracy are in proximity indicating the good fit of these models. For the most part, the performance of the machine learning models has been better when used with the HFE features. Results suggest that GNB, the poor performer with BoW, TF-IDF, and Word2Vec, shows far better performance with HFE and reaches the accuracy score of 0.87 with good values for other performance evaluation metrics. Selection of important features through Chi2 and joining them to make hybrid features leads to a better fit of the models which produces better results in return Confusion matrices in Figure 9 show the significance of HFE features as models achieved their best results when used with HFE features. According to the confusion matrix, SVM gives the highest correct predictions with 3,044 VOLUME 4, 2016 correct predictions and gives only 105 wrong predictions. LR also performs better with 3,028 correct predictions and 122 wrong predictions. GNB achieves its best accuracy with HFE features and gives 2,731 correct predictions and 419 wrong predictions. A comparison of models' performance using each feature extraction technique is given in Figure 10.

E. RESULTS OF DEEP LEARNING APPROACHES
Together with machine learning models, deep learning models are also employed for the task at hand. For this purpose, three state of the arts models LSTM, CNN-LSTM [36], and GRU [37] are utilized in this study. These models are reported to show better performance for text classification in previous studies. A list of parameters and their associated values are provided in Table 13, as well as, the architecture of each layer of these models. Feature engineering is not required for deep learning models. Table 14 shows the performance of deep learning models used in this study. Results suggest that both LSTM and CNN-LSTM obtain an accuracy of 0.92 each for sentiment classification while the classification accuracy of GRU is 0.87. It indicates that deep learning models are low-performing as compared to machine learning models. Predominantly, deep learning models perform better when trained with a large dataset. Using a small dataset, deep learning models tend to overfit and their accuracy is declined steadily [38], [39]. Conversely, the proposed HFE provides a set of important features to the machine learning models and produces better results.

F. DISCUSSION ON MACHINE LEARNING AND DEEP LEARNING MODELS
This study deploys state-of-the-art machine learning models including both linear and tree-based models with the hybrid feature engineering approach. Linear models such as SVM and LR perform well as compared to other models because of the large feature set. A combination of TF-IDF, BoW, and Word2Vec generates a large feature set that can be good for linear models. Consequently, SVM obtains the highest accuracy score as compared to other models. Although the performance of ETC is also good but compared with SVM or LR, it achieves lower accuracy. KNN is the worst performer in this study because of its lazy learning characteristics as it works well often when the feature set is small. In addition to the machine learning models, deep learning models are used as well which show comparatively lower performance. LSTM, GRU, and CNN-LSTM models have low performance on account of a smaller dataset. Deep learning models require a large dataset, as well as, large feature vector to obtain a good fit which is not possible with the current dataset. A critical analysis of machine and deep learning models is provided in Table 15 regarding the advantages and disadvantages in the  More suitable for small size datasets.
Poor performance for linear data and sparse feature set, as compared to SVM. GNB HFE, TF-IDF, BoW, Word2Vec Default setting GNB is good with simple features such as BoW.
Large feature set affects performance of GNB. KNN HFE, TF-IDF, BoW, Word2Vec n_neighbour= 5 Simple to implement and interpret, low computational cost.
Shows poor performance on large feature set.

LSTMS
Embedding vector LSTM layer, dropout layer, dense layer Good for text data due to its recurrent architecture.
Requires a large dataset for a good fit.

GRU
Embedding vector GRU layer, dropout layer, RNN layer, dense layer Shows good results with text data due to its advanced recurrent architecture and is faster than LSTM.
Requires a large dataset for a good fit.

CNN-LSTMS
Embedding vector LSTM layer, flatter layer, Maxpooling layer, dropout layer dense layer CNN combined with LSTM increases important feature extraction process and improves the performance of LSTM.
CNN requires a large feature set to find significant features for good fit.
context of this study. It includes the influence of the selected feature extraction approach, values of the parameter(s) and the structure, etc. on the overall performance.

G. VOLUME ANALYSIS
Volume analysis aims at finding the information related to the volume of text/tweets which may be the number of total tweets, negative tweets, etc. Often it is used with the geolocation information to show the distribution of tweets concerning the location of the posted tweets.
For the collected dataset containing tweets about Afghanistan's current situation, tweets are annotated with respect to the containing sentiments. For this purpose, the top five countries with the highest number of tweets are considered only which include India, the US, Pakistan, UK, and Afghanistan. The prime objective is to find the reaction of the people towards the current situation in Afghanistan under Taliban rule. It helps to determine the overall sentiment level of people from a specific country. Figure 11 presents the comparison between the number of words and the number of sentences found in positive labeled tweets. The objective of this representation is to show the ratio of negative, positive, or neutral words in positive labeled tweets. The rationale for presenting this ratio is that even the positive tweets may contain negative and neutral words as the label is assigned based on the sentiment found in the majority of words in tweets. So, even a positively labeled tweet may contain several negative and neutral words. Figure 11a shows that the majority of words used in positive labeled tweets is neutral. However, the number of positive words found in positive tweets is high as compared to negative words which are 29.87% and 20.56%, respectively. Similarly, Figure 11b shows the ratio of positive, negative, and neutral sentences in positive labeled tweets. Contrary to Figure 11a where the sentiment is determined for words, sentiment is given to each sentence for the data shown In Figure 11b. Then the ratio of sentences is determined, as shown in Figure 11b where neutral, positive, and negative sentences are 51.74%, 28.62%, VOLUME 4, 2016 and 19.63%, respectively. Figures 12a and 12b show the distribution of negative and positive tweets, respectively for the top five countries with the highest number of tweets. The objective is to analyze the ratio of positive and negative tweets from different countries. It shows that 49% of all the negative tweets originate from India, followed by the US and the UK comprising 20% and 17%, respectively of the total negative tweets. The lowest number of negative tweets are originated from Pakistan and Afghanistan, respectively containing 5% and 9% of the total negative tweets. Similarly, for the positive tweets, Pakistan has 26% while Afghanistan has 23% of the total positive tweets. India contributes the lowest 15% of the positive tweets, followed by the UK with 17% of the total positive tweets. Figures 11, 12 and 13 represent different aspects and characteristics of tweets collected for this study. Figure 11 shows the ratio for positive, negative, and neutral sentiments for words and sentences in the dataset exhibiting that even the positive annotated tweets contain several negative and neutral words and vice versa. Figure 12 indicates the ratio of positive and negative tweets for the top five counties from the collected dataset. The highest ratio of positive and the lowest ratio of negative tweets, is by Pakistan which indicates peoples' support from Pakistan towards Afghanistan. On the other hand, the ratio for negative tweets is higher from India which highlights the trends of political interest. Figure 13 shows the ratio of tweets for the top five counties which show that a higher number of tweets are posted from India and Pakistan. Both countries have their political interest in the neighbor Afghanistan and the public also expresses their opinions regarding that.
Presenting only the ratio of positive and negative tweets to the total number of positive and negative tweets may not be enough to understand the trend of individual countries regarding positive and negative tweets. In addition, the distribution of positive, negative, and neutral tweets for each country is more meaningful as it can portray the overall sentiments of a specific country concerning the current situation in Afghanistan. Figure 13 shows the distribution of tweets sentiments for the top five countries with the highest number of tweets including India, the US, the UK, Pakistan, and Afghanistan. It shows that for the most part, the sentiments in the tweets are neutral with the highest ratio of 64% found in the tweets originating from Afghanistan. The neutral attitude from Afghanistan people shows a diplomatic mindset concerning the uncertainty of the situation. The lowest ratio of neutral tweets originates from the UK with 37% of the total tweets originating from the UK. For Pakistan, India, and the US, approximately half of the posted tweets has neutral sentiments. On the other hand, the ratio of negative sentiments is the highest in the UK, followed by the US and India, respectively. The lowest ratio of negative tweets is found in Afghanistan and Pakistan with respect to total tweets originating from these countries. Considering the total tweets from the top five countries only, the highest number of tweets are generated from India, followed by Pakistan, Afghanistan, the US, and the UK which is normal as the immediate stakeholders of the current situation in Afghanistan are its neighbors Pakistan and India.

H. ANALYSIS OF AFGHANISTAN'S SITUATION AFTER 14 DAYS OF INITIAL DATASET EXTRACTION
For analyzing if there is any shift in the sentiments, after the initial data collection, a new dataset is collected from 03 September 2021 to 08 September 2021. Sentiment analysis is performed on the tweets grouped by the day and the results are displayed in Figure 14. Results indicate that despite the neutral sentiments being in the highest number of tweets, an increasing trend is observed in the ratio of positive sentiments while negative sentiments seem to decline. It may be due to the fact that the situation in Afghanistan is on the route to normalization with the formation of a new Government.

I. TOPIC EXTRACTION USING LATENT DIRICHLET ALLOCATION
The sentiment analysis on Afghanistan-related tweets will be more meaningful with topic extraction so this study performs topic extraction using LDA. The underlying objective is to list down the current issues as presented in the tweets. LDA is a statistical model which discovers the topics from given documents [40]. LDA is fit on the data once the preprocessing and features extraction has been carried out. For this purpose, different hyperparameters are used for LDA. For example, n_components is set to 5 for extracting 5 most discussed topics. A list of other hyperparameters is provided in Table  16.
The top 5 most discussed topics are extracted for both positive and negative tweets as shown in Table 17. Based on the topics from positive tweets, it can be inferred that the people of Afghanistan seem satisfied with the Taliban takeover in Afghanistan as can be seen from Topic 1 and Topic 3 in positive tweets. On the contrary, topic extraction from negative tweets shows that topics found in the negative tweets are mostly related to the US concerns in Afghanistan as shown in Topic 1, 3, and 4 of negative tweets. Topic modeling results show that positive sentiments come from those tweets which are praising the Taliban's bravery in fighting the US forces. Similarly, positive sentiments appeared in tweets favoring the local Government and people's autonomy. Predominantly, these tweets are originated from the Middle East and Afghanistan neighboring countries. Many tweets contain negative sentiments for the Taliban due to several issues. First, many humanists and social workers show their reservations concerning the violation of human rights in general such as freedom of speech and action. Secondly, voices are raised regarding the ban on women's education. Similarly, concerns have been shared regarding the forced implementation of various policies by the Taliban. Table 18 shows the words most commonly used in tweets containing negative and positive sentiments. In addition to the most common words from the overall tweet datasets, further experiments are carried out on the country level to find the positive and negative words. Table 19 shows the list of the positive and negative words with respect to each of the top 5 countries with the highest number of tweets.

V. CONCLUSIONS
This study performs sentiment analysis concerning the current situation in Afghanistan under Taliban control. The study accomplishes two objectives: sentiment analysis of tweets and volume analysis. For the former, a proposed hybrid feature engineering approach is used with several well-known machine learning models, as well as, three deep learning models including CNN, CNN-LSTM, and GRU. The latter involves the analysis of sentiments for the top five countries from where the highest number of tweets are originated. Geotagged tweets are utilized for this purpose to present the positive, negative, and neutral attitudes of people towards the current situation in Afghanistan. Furthermore, topic modeling is performed to extract the most discussed topics from negative and positive tweets using LDA. Of the three feature engineering approaches including BoW, TF-IDF, and Word2Vec, for the most part, the performance of machine learning models is better with BoW. Among the used machine learning models, SVM tends to show the highest performance with an accuracy score of 0.97 when used with the proposed hybrid features. The hybrid feature engineering approach makes use of Chi2 to select the most appropriate features from BoW, TF-IDF, and Word2Vec and combines them to elevate the performance of models. Predominantly high performance of deep learning models is bounded with a large volume of data and apparently, the comparatively smaller size of the data used in this study is not suitable to achieve the significant results with deep learning models. Volume analysis indicates that the majority of the sentiments found in the tweets considering the top five countries comprise neutral sentiments. However, considering the negative sentiments, a large portion of the negative tweets is originated from India, followed by the US and UK. On the other hand, the majority of the positive tweets are initiated from Pakistan and Afghanistan regarding the current situation in Afghanistan. Topic modeling of the positive tweets reveals that a major part of Afghani people seems satisfied with the Taliban's takeover. On the contrary, topics from the negative tweets are mostly related to the US concerns in Afghanistan. Gap analysis reveals that on average there is a positive shift in the sentiments over the past two weeks.

VI. FUTURE WORK
This research provides insights into the sentiment of people from different countries regarding the current situation in Afghanistan under the Taliban regime. Keeping in view the topics found using LDA, Governments can grasp the gravity of the situation and can take actions accordingly. Social and news media agencies can further explore the sentiments of people, especially the people from Afghanistan, regarding the Taliban regime and highlight the probable future challenges. For the future, the authors aim to conduct a study for management implications and technical improvement providing guidelines on dealing with different current challenges.
[39] Andreas Kamilaris and Francesc X Prenafeta-Boldú.  PATRCIK BERNARD WASHINGTON is a philosopher leading the world conversation at the intersection of finance, technology, and public policy. He is also a distinguished author and public speaker providing innovative solutions, investing in transformative ideas, and changing people's financial trajectory. He is an Associate Professor of Finance at Morehouse College. His research focuses on the intersections between investments, asset pricing, technology, real estate, and corporate governance. As a leading authority on data science in finance, he has integrated deep and machine learning into his research while assisting companies in their implementation of artificial intelligence.