Vaccine Hesitancy Hotspots in Africa: An Insight From Geotagged Twitter Posts

Many social media users express concerns about vaccines and their side effects on Twitter. These concerns lead to a compromise of confidence which brings about vaccine hesitancy. In Africa, vaccine hesitancy is a major challenge faced by health policymakers in the fight against COVID-19. Given that most tweets are geotagged, clustering them according to their sentiments could help identify locations that may likely experience vaccine hesitancy for health policy and planning. In this study, we collected 70000 geotagged vaccine-related tweets in nine African countries, from December 2020 to February 2022. The tweets were classified into three sentiment classes—positive, negative, and neutral. The quality of the classification outputs was achieved using Naíve Bayes (NB), logistic regression (LR), support vector machines (SVMs), decision tree (DT), and K-nearest neighbor (KNN) machine learning classifiers. The LR achieved the highest accuracy of 71% with an average area under the curve of 85%. The point-based location technique was used to calculate the hotspots based on the locations of the classified tweets. Locations with green, red, and gray backgrounds on the map signify a hotspot for positive, negative, and neutral sentiments. The outcome of this research shows that discussions on social media can be analyzed to identify hotspots during a disease outbreak, which could inform health policy in planning and management of vaccine hesitancy in Africa.

Fig. 1.Countries reporting logistic data (adapted from [6]).This figure is the combination of countries with expired doses and at risk to expire doses of vaccines.It was used for comparison and validation of the results of our social media data in this article.
restrictions of movement and lockdowns of businesses in the last two years [1], [2], [3].The impact of the lockdown of businesses on the world's economy is clearly significant in the rate of inflation in recent times [4].As of August 2022, over 12 130 billion doses of vaccines have been administered globally [5].Africa contributed about 5.83% of the total vaccines doses administered in the world with over 70 82 19 474 doses [5], [6].Thirty two countries in Africa were identified as, combined, countries with expired doses and countries with doses at risk [6], see Fig. 1.
As Africa and the rest of world recovers from this obvious shock caused by the COVID-19 outbreak, vaccination against the virus has remained necessary in gaining immunity toward the management and control of the COVID-19 pandemic.The announcement of the vaccine mandate by health policymakers was graced with a lot of opposing views in Africa [7], [8], [9].
For instance, some conspiracy theorists had rumored that the COVID-19 vaccination mandate is targeted to depopulate Africa.While some religious leaders and influencers advised their followers against taking the COVID-19 vaccines [3], [10].Others took to legal action against the compulsory vaccination of citizens, they argued that it is against the citizens' fundamental human rights [11].In order to promote their views and pass the antivaccination message to a wider audience, most of the conveners took to social media, such as Twitter to create hashtags (topics) to drive their points [12], [13].
Information sharing on Twitter spreads very fast even if it is a rumor from an unverified source.The impact of rumors is always dangerous especially in places where users are not well informed about the subject of discuss [14].Antivaccination messages spread as a form of users' post, retweet, or share without any form of editorial oversight.These in a way weaken the confidence level of the public well before they are vaccinated [4], [15].
According Al-Uqdah [16], people who use social media without referencing trusted sources may be particularly vulnerable to disinformation.Similarly, vaccine-hesitant persons are more likely to be exposed to nontrusted social media sites as their only information source.In [17], discussion of the current position of social media platforms in propagating vaccine hesitancy was given full consideration.The steps on how social media may be used to improve health literacy and public trust in vaccination were also considered.Neha et al. [17] examined social media and vaccine hesitancy as a new update for the COVID-19 era and other globalized infectious diseases.
Meanwhile, Ennab et al. [18] suggested that misinformation about COVID-19 on social media could pose a significant threat to public health as it has the potential to worsen public health challenges by way of encouraging disease spread among pregnant women.The detrimental influence of vaccine misinformation on Twitter to public health cannot be over emphasized.There should be an increase in the use of social media like Twitter to support public health in the continuous struggle against vaccine hesitancy not just in the era of COVID-19 but for future outbreaks [19], [20].
Insights from Twitter posts (tweets) can help health policymakers to understand the extent of vaccination awareness from users.One of the ways to achieve this is to perform sentiment analysis on these tweets.This process in a way brings to knowledge the opinion of users' about a subject matter, such as COVID-19 vaccination.Sentiment analysis on tweets can only be achieved by, first, annotating the tweets into any of positive, negative, or neutral sentiment classes.Annotation of tweets could be done manually by human beings or automatically.The manual annotation of tweet is said to be the most reliable option.But, it is greeted with human bias and can be time consuming when working on a large amount of dataset [21].The automatic annotation of tweets involves the use of pretrained models that recognize polarity of words in a given text.It can be implemented using Textblob, AFINN, or VADER.The use of automatic classification models is best when working on a large amount of dataset.Because it is fast and there is low rate of human bias.In the remaining part of this section, we will discuss these tools in a nutshell.

A. Sentiment Analysis With Textblob
TextBlob is a known sentiment analysis lexicon-based model available as a Python library.It provides a simplified text processing technique relevant for natural language processing (NLP).Textblob assigns score of −1 and 1 to each word based on the polarity and subjectivity of the text.In [21], Textblob was used to annotate US airline dataset containing 14 640 tweets reviews.The annotated dataset was trained on six supervised traditional machine learning models.The model developed with deep learning algorithms performed better than the model developed with the traditional machine learning algorithms when trained on Textblob annotated text dataset [22], [23].But, with a proper parameter fine tuning, performance of the traditional machine learning algorithms could be improved [24], [25].

B. Sentiment Analysis With AFINN
AFINN is a lexicon technique that uses dictionary of words together with polarity score.It maps the corresponding polarity to every word in the text.AFFINN has been used to perform sentiment analysis in different areas, including financial forecasting [26], customer rating [27], and product reviews [28].However, there was low performance of models trained with COVID-19 tweets dataset that was annotated with AFINN.The results showed a lower performance with traditional machine together with deep learning models.This shows that AFINN annotated tweets dataset could produce lower accuracy score than tweets annotated with Textblob or VADER [21].

C. Sentiment Analysis With VADER
VADER means Valence Aware Dictionary for sEntiment Reasoning [21].It generates sentiment scores or intensities using a dictionary by mapping the lexicon features.VADER has been used to annotate different text-based dataset, especially tweets.Existing research shows that tweets dataset annotated by VADER produces best result when trained on traditional supervised machine learning [29], [30], [31].The optimum performance of VADER dataset is achieved from the multiclassification of the dataset [32].
We observed that Textblob, AFINN, and VADER annotate texts differently.The sentiment classes are sometimes different depending on the tool used to annotate.This variety in annotation also influences the performance of different machine learning models.This means that, all these tools have the ability to annotate text in their own way.However, given that VADER is the most recent tool developed specifically for social media text, we choose VADER to annotate our dataset [33].
To the best of our knowledge, there is no work that used VADER to annotate geotagged COVID-19 vaccine tweets to identify hotspots.By geotagged COVID-19 vaccine tweets, we refer to the geographical identification metadata of the tweets.The tweet classification was done in nine African countries.The nine African countries are, in no particular order, Nigeria, South Africa, Zimbabwe, Botswana, Namibia, Rwanda, Mozambique, Cameroon, and Eswatini.These countries were selected because our consortium, the African-Canada AI & Data Innovation Consortium (ACADIC), has partner members in these countries.The tweets were collected using trending keywords about the COVID-19 vaccines in the selected countries.
The main contribution of this research is the application of a point-based location technique in the identification and visualization of vaccine hesitancy hotspots from labeled tweets.First, we generated a dataset containing geotagged COVID-19 vaccine tweets from the nine African countries.
Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
The tweets were annotated according to their sentiments using VADER.The result was validated with traditional machine learning classifiers.The machine learning classifiers are Naíve Bayes (NB), logistic regression (LR), support vector machines (SVMs), decision tree (DT), and K-nearest neighbors (KNN).Then, we calculated the hotspots based on the location of the labeled tweets and visualized the same on a map.The research shows that by analyzing social media discussions, hotspots of sentiment towards vaccination during a disease outbreak can be identified.Especially, toward vaccination during a disease outbreak.We believe that this could inform health policy in planning and management of vaccine hesitancy in Africa.
The remainder of this article is organized as follows.Section II discussed different approaches used in the selection of the keywords, collection, preparation, and analysis of data.Section III presents the outcome of the analysis of the data collection, analysis, and visualization of the hotspots by countries.Section IV discusses the results in relation to other relevant existing works.Finally, Section V presents the conclusion of the manuscript and further recommendations.

II. METHODS
This section is divided into list of abbreviations, keyword selection metric, data collection, data preprocessing, tweet labeling, tweet sentiment classification, selection of model parameters, and sentiment visualization.

A. List of Abbreviations
In this section, we present the list of frequently used abbreviations in Table I.

B. Keyword Selection Metric
In selecting the keywords for the data collection, we first identified the approved and administered vaccines in the nine African countries.Table II shows the approved vaccines by country.
We identified all these approved vaccine in Table II because we used the trending topics around the vaccines to collect the data.By trending topics we refer to keywords, phrases, or topics that are commonly used or mentioned about the vaccine within a period of time in a social network or microblog such as Twitter.The trending topics, hashtags, or keywords about a vaccine are selected from the Twitter app search box.Then, we selected the Trending tab and the COVID-19 tab.This approach was used to select the following popular combined keywords and hashtags for the nine African countries.Fig. 2 shows the list of the 43 popular keywords we used for this research.These keywords were generated from the approved vaccines in the selected countries.

C. Data Collection
We used the academic researcher account from the Twitter API to access the Twitter database which allows collection of up to 10 million historical tweets per month.First, we created an application to generate an access token from Twitter.The access token was used in Python version 3.6 script to   script.The Python script was used to perform a historical (an archive) search of tweets that contain COVID-19 vaccine keywords in Fig. 2. The preferred language of the tweet is English.
The total of 70 000 tweets was collected from December 2020 to February 2022 using the archive search process.Each Tweet contains most of the following features described in Table III.
All tweets collected in the search were anonymized to protect users' privacy.The distribution of the collected tweets by country is shown in Fig. 3.

D. Data Preprocessing
User tweets are normally unstructured and contain a lot of information about the data they represent that may not be useful.Cleaning the user tweets is highly needed.We collected tweets, date created, time created, and provinces from the dataset into a dataframe using Pandas version 1.2.4 [35].The tweets were prepared for NLP by first removing the URLs, duplicate tweets, tweets with incomplete information, punctuations, special and nonalphabetical characters, non-English words, and Stopwords using the tweets-preprocessor toolkit version 0.6.0[36], Natural Language Toolkit (NLTK version 3.6.2) [37], and Spacy2 toolkit (version 3.2) [38], [39].We also used the Spacy2 toolkit to perform tokenization of the tweets.Tokenization in this context is the process of breaking tweets into words.This process reduced the tweets in the dataset to 46 189 tweets.

E. Tweet Labeling
Given the size of the dataset, we used VADER [33], [39], a lexicon and rule-based pretrained NLP tool in the NLTK package.VADER is used to perform sentiment analysis expressed on social media text such as Twitter.It can handle the sentiment expressed in words, abbreviations, emojis, and intensity of emotions [40].By pretrained, we mean the VADER model has been previously trained on a large dataset to recognize sentiments expressed on texts, especially social media text.When our dataset was passed into VADER model, each line of tweet was picked and a vector of sentiment scores for positive, negative, neutral, and compound polarities were produced.The sentiment score of the tweet was obtained by summing up the intensity of each word in the tweet.The sentiment scores for each of the sentiment class (positive, negative, and neutral) were normalized to be between 0 and 1.
The compound polarity score is the aggregate measure of all the sentiments, normalized to be between the range [−1.0, +1.0]where −1.0 represents extreme negative and +1.0 represents extreme positive [40], [41].Furthermore, the compound polarity was used to assign the sentiment such as, positive, negative, or neutral to a tweet as label.A tweet with a compound polarity ≥ 0.5 is assigned the label positive, <0 is assigned the label negative, and x, where x satisfies the inequality 0.5 > x ≥ 0 is assigned the label neutral [42].
Sample tweets and the corresponding sentiments scores are shown in Fig. 4. Table IV shows labeled tweets using VADER tool.
From the compound polarity scores in Fig. 4, the tweets are labeled as shown in Table IV.
The approach demonstrated in Section II-E was used to label all the tweets according to their sentiment class.A distribution of tweet sentiments by country labeled using VADER pretrained model is shown in Table V.
However, after labeling the tweets, the distribution of the sentiment classes in the dataset was imbalanced.Most Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.  of the tweets in the dataset were labeled neutral.The SMOTE [43] sampling technique was used to balance the dataset.We achieved a balanced distribution of 33.333% for each of the target parameters (positive, negative, and neutral) in the dataset.
The performance evaluation metrics were used as a measure of how well the classifiers classified the tweets according to their sentiments against the VADER labeling.They include accuracy, F-measure, ROC curve, and AUC (see Table I).Accuracy shows the part of the predictions the classifier got correctly.It is the most popular performance evaluation metric.Accuracy evaluation metric was used because it focuses on the number of correctly classified tweets with respect to the total number of tweets in the dataset [51].The accuracy formula adapted from [52] is shown in the following: where TP, TN, FP, and FN represent the true positive, true negative, false positive, and false negative, respectively, see Table I.
The F-measure calculates the harmonic mean between precision and recall.The F-measure score is in the range [0, 1].It was used because it tells how precise the classifier is, that is, how many tweets it classifies correctly, as well as how robust it is (probability of making correct classification).F-measure tries to find the balance between precision and recall.The greater the F-measure, the better the performance of the classifier.F-measure also referred to as F1-score in this article, was calculated using (2) as adapted from [52] F-measure = 2 × 1 where precision is the ratio of correctly labeled tweets by VADER and correctly labeled tweets predicted by the classifier.Recall is the ratio of correctly labeled tweets by VADER and relevant tweets (all tweets that should have been labeled correctly by the classifier).Precision and recall were calculated using ( 3) and ( 4) as adapted from [52] Precision = TP TP + FP (3) Meanwhile, the ROC curve was used because it shows the plot of sensitivity [true positive rate (TPR) or recall) against specificity (false positive rate (FPR)].The higher the FPR, the more likely the classifier will classify the tweet sentiment wrongly.Mathematically, ( 5) and ( 6) are used to calculate the TPR and FPR as adapted from [52] TPR = TP TP + FN (5) The more the ROC curve aligns toward the top left corner of the plot, the better the classifier does at classifying the tweets into various sentiment classes.This, however, helps to ascertain how well the classifier classified each tweet sentiment class against the tweets labeled by VEDAR pretrained model.AUC is the area under the curve of the plot FPR against TPR at different points in the range [0, 1].A higher value suggests that the classifier performed well.Therefore, AUC performance metric was used to ascertain how much of the plot is located under the curve.

G. Model Parameter Selection
Model performance could be optimized by tuning the userdefined parameters for better result.We present the userdefined parameters used for each classifier.Other parameters that are not stated here are on their default values in the sklearn package.Each classifier was defined from the sklearn package in Python and their parameter fine-tuned as shown in Table VI.

H. Sentiment Visualization
As Twitter no longer gives access to Latitude and Longitude of user tweets, we collected the GeoCoordinate(bbox) or  When we searched the latitude and longitude values on Google map, it showed the location as Yandev, North Nigeria.See Fig. 5.
The above approach was used to calculate the latitude and longitude of all the tweets to be able to visualize the sentiment of the tweets on the map.The interception of latitude and longitude is called a point in this article.A point is made up of one or more tweets.Thus, a location can contain one or more points.We took the count of sentiments of tweets at a location to identify hotspots.The hotspots of in a location of a province or state, for all the nine countries were visualized on the map with the help of ArcGIS Online [53].The ArcGIS Online is a web-based mapping software used to build interactive web maps.See result section for more details.

III. RESULT
In this section, we present the results in two parts.The first part deals with the output of the classification of tweets according to their corresponding sentiments using machine learning.This is to show the validation of the VADER labeled tweets.The second part deals with the output of the analysis, identification, and visualization of the sentiment hotspots on a dashboard using the ArGIS Online.

A. Result of Tweet Sentiment Classification
In this section, we trained the classifiers (also called models) on tweets with sentiment classes labeled by VADER pretrained model.A summary of the performance of the classifiers is presented in Table VII.
While there is a clear difference in the accuracy scores of the classifiers, the LR model performed better with accuracy score of 78%, average F1-score 75%, and average AUC score of 90%, than the other machine learning classifiers used.The average AUC score of 89% for SVMs is slightly lower than the average AUC score for LR classifier.The accuracy score of 72% for SVMs classifier is slightly higher than the accuracy score of 68% for NB classifier.Similarly, the average F1-score of 67% for SVMs classifier is slightly higher than the 65% for NB classifier.Even though DT and KNN classifiers have the lowest in accuracy scores, average F1-scores, and average AUC scores, respectively, DT classifier was seen to be higher than KNN in all the performance metrics.
The above analysis is an indication that these models can classify tweets according to their sentiment classes.However, the LR classifier proved to be best fit for this type of classification problem given all indicators.One such indicator is that the 46 189 tweets generated a large feature set that was suitable for the LR classifier higher performance.
To further validate the performance of the models, we visualized the ROC metric to evaluate the quality of the multiclassification output, together with the AUC, see Fig. 6.The numbers 0, 1, and 2 represent negative, neutral, and positive sentiment classes, respectively.
From Fig. 6, we can clearly understand how well the machine learning classifiers classified the tweet sentiments.The ROC curve shows the true positive rate against false positive rate.As said in Section II-F, the more the curve aligns toward the upper left corner of the plot, the better the machine learning classifier does at classifying the tweets into various sentiment classes.As shown in Fig. 6(b), the LR classifier does well in classifying the tweets into various sentiment classes, followed by SVM model in Fig. 6(c).However, unlike the NB classifier with an average performance in the classification of the tweet sentiment [Fig.6(a)], the DT and KNN models performed poorly in in the classification of the tweets into different sentiment classes, see Fig. 6(d) and (e), respectively.As usual, the AUC was used to ascertain how much of the plot is located under the curve.The LR classifier demonstrated to have performed better with a large feature set and multiclass classification.
Since the LR classifier performed better than the other models, understanding the features that influenced performance of the sentiment classification of the tweets is necessary.We used ELi5 [54], an interpretable machine learning model to visualize the top 20 features in their order of importance that are responsible for the LR model performance.Table VIII shows the weight and features of the top 20 words that influenced the performance of the sentiment classes of the tweets as classified by the LR model.
The idea behind feature importance is to know how the performance metrics (accuracy, precision, recall, and F-measure) behaved with respect to the feature existence.In Table VII, we can see that best has the highest weight with +3.622 for positive sentiment class, followed by hell with +3.091 for negative sentiment class, and bias with +2.091 for neutral sentiment class.This means that the features best, hell, and bias affected the model performance with a probability of +3.622, +3.091, and +2.091 in the classification of tweets to positive, negative, and neutral sentiment classes, respectively.Next, in Section C, we explained tweet sentiment hotspot analysis.

B. Model Performance Comparison
In this section, we used ANOVA test to record the performances of difference models on VADER labeled dataset, Textblob labeled dataset, and AFINN labeled dataset.We performed ten iterations for each model and got different accuracy scores.For each iteration, the dataset is divided into train and test sets.Such that the train and test set are different for each run.Table IX shows the accuracy scores of the models.
The corresponding boxplot showing the distribution of machine learning models (ML model) performance with respect to the annotation tools (base model) is shown in Fig. 7. From the boxplot, we can easily identify the differences between ML models and the annotation tools.
Similarly, Table X shows the outcome of the ANOVA test used to record the significant difference between the Fig. 7. Distribution of ML models with respect to accuracy.performance of the machine learning models with respect to different annotation tools used.The ANOVA statistical test takes two hypotheses for the output as follows.
1) Null Hypothesis H 0 : The performance of the models is equal.

2) Alternative Hypothesis H A :
The performance of the models is not equal.
From Table X, p values obtained from ANOVA analysis for ML model, Base model, and interaction are statistically significant ( p < 0.05).This means that there is a sufficient evidence to reject the H 0 that the performance of the models is equal.We conclude that the type of annotation tool and machine learning algorithm can significantly affect the performance of the model.
Next, we visualize the interaction between the annotation tools with the machine learning model from the ANOVA analysis.This is to ascertain the interactive effect of the means of annotation tools and machine learning models.It also helps us to visualize the exact machine learning model that performs best with the annotation tools.Fig. 8 shows that LR model performed best with the Textblob annotated dataset followed by AFFIN than VADER.Conversely, KNN has the least performance with AFFIN annotated dataset.Meanwhile, VADER is developed specifically for social media text classification [33].This was our main motivation for using VADER in this study.Moreover, VADER performed well with LR model in the multiclass classification of large feature set, see Fig. 6.

C. Tweet Sentiment Hotspot Analysis
A typical hotspot areas are concentrations of incidents within a limited geographical area that appear over time.Measuring a hotspot could be complicated.However, there are many statistical techniques designed to identify hotspots, including hierarchical technique, partitioning technique (K-means), density technique, clumping technique, riskbased technique, miscellaneous techniques, and point-based location technique [55].There is also a two or more combinations called hybrid technique.For instance, the riskadjusted nearest neighbor hierarchical clustering routine is a risk-based technique with a combination of clumping technique.Similarly, the grouping of partitioning and hierarchical techniques is a hybrid technique called STAC.These statistical techniques are mainly aimed at grouping incidents together into a relatively high or low coherent clusters or concentrations [56], [57].
Given that we worked with latitude and longitude of all the tweets which allowed us to visualize the sources of Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.the tweets as a point on the map, the point-based location technique is best fit for our analysis [58].The point-based location technique is one of the most intuitive types of cluster involving the number of tweets coming at different locations.The locations with the most number of tweets are referred to as hotspots.The count of each tweet sentiment class in the hotspots is calculated.The sentiment class with the highest count becomes the dominant sentiment in that location, thus, it is identified with green if the sentiment class is positive, red if the sentiment class is negative, and gray if the sentiment class is neutral.To calculate the sentiment count for each sentiment class in a location (S c ), we use the formula in (7) as adapted from [58] where i is the count of the sentiment class in a point and m represents the total number of sentiment classes in a point.Similarly, j represents count for each point in a location and n represents the total number of points in a location.The X i, j represents a tweet sentiment class in a point i and location j .m i=1 X i, j represents the sum of sentiment classes in a point, while n j =1 m i=1 X i, j is the sum of each sentiment class in a location.The outcome of the above process was visualized on a dashboard and the screenshots are shown in Fig. 9.As shown in Fig. 9, the identified sentiment hotspots are colored green, red, and gray to signify the positive, negative, and neutral sentiment dominance in that location for each of the nine countries.

D. Limitation
The Twitter data used for this research only reflects the opinion of Twitter users whose locations are in the selected nine African countries, within the period of discussion.As at November, 2022, there is about 8.46% online adults who use Twitter in Africa [59].Therefore, this study does not fully represent the opinions of people in the identified African countries, especially regarding COVID-19 vaccines.This study only provided an insightful analysis from the Twitter data to support policy making, management and planning.
Additionally, VADER used in this study is a pretrained model for sentiment classification.It does not have the capacity to properly label figurative language, such as sarcasm, pidgin English, and vernacular.Pretrained models, such as VADER, Textblob, and AFINN cannot replace human's annotation.As the most precise annotation is the one done by humans irrespective of human bias, subjectivity, and errorproneness.Hence, VADER annotated labels can be used as assistance for human annotators especially in the case of a large dataset.

IV. DISCUSSION
In this section, we discussed the hotspots identified in Fig. 9 for each country.In Botswana, we identified nine locations as hotspot areas for positive, negative, and neutral sentiment classes.These locations include Chobe, Central district, and Kgatleng district, for positive sentiment hotspot.Kweneng district, for negative sentiment hotspots and Kgalagadi district, Ghanzi district, Southern district, and South-east district for neutral sentiment hotspots, respectively, see Fig. 9(a).Moreover, as of July, 2022, 64.4% of the Botswana population is said to be fully vaccinated [6].We observed that Kweneng district with negative sentiment hotspot is predominantly Christians and the second highest populated district.This could also suggest that religion may have played a major role in influencing citizens' sentiments against vaccination, given that a lot of religious leaders were against the vaccine mandate at that time [10].Furthermore, the combination of all the locations is identified with neutral sentiment hotspots and the Kweneng district is identified with the negative sentiment hotspot amounted to about 35% of the countries' total population.This could be a contributor as to why Botswana was listed as one of the countries with expired and at risk to expire vaccines because a large number of citizens may have been hesitant toward the vaccine at the time [6].
Similarly, in Cameroon, we identified the South, East, and West regions as neutral sentiment hotspots.The Southwest and Northwest regions were identified as negative sentiment hotspots.While the Littoral, Central, Adamawa, and North regions remain the positive sentiment hotspots, the Extreme North region of the country was not identified as a hotspot from our data, see Fig. 9(b).The suspension of all medical and humanitarian activities in South-West region of Cameroon by the Médecins Sans Frontiéres (MSF) in December 2021 may have contributed to the negative sentiment in the region.The suspension, following the arrest of two of MSF members and collaborators by the Cameroon armed police force, as such, may have played a role in influencing negative sentiments toward the vaccine at that time [60].Moreover, Southwest and Northwest regions are mostly Christians and Muslims; hence, religion may have also played a role in influencing their sentiments toward vaccination.For instance, 70.7% and 24.4% of the countries' population practice Christianity and Islam, respectively [61].Additionally, workers in Cameroon rejected to take the vaccine because they claim not to be well informed about the aftereffect of the vaccine; thus, Cameroon being listed as a country with expired vaccines and at risk to expire vaccines is not a surprise [62].
In Eswatini, we identified Manzini region as negative sentiment hotspot from our dataset.The Hhohho region was identified as positive sentiment hotspot.The Lubombo and Shiselweni regions were not identified as any sentiment hotspots [see Fig. 9(c)].The Manzini region is the country's largest urban center, which is known as the Hub of the country.
Although 29.8% of the entire population of the country is said to have been vaccinated [63], the country was, however, listed as a country with expired and at risk to expire vaccines.This could be associated with the influence of the Manzini region being the country's largest urban center, which is identified as negative sentiment hotspot.
Out of the eleven provinces in Mozambique, including Maputo, the administrative region, we identified three regions with sentiment hotspots.These regions include Tete, Sofala, and Zambezia regions.The Tete and Sofala regions were identified as neutral sentiment hotspots.While the Zambezie was identified as negative sentiment hotspot, see Fig. 9(d).The Zambezia region is the second most populous region of Mozambique and is predominantly Muslim religion that is practiced in the region.Together with Nampula region, also in the north-central province, they account for 45% of Mozambique population.These may have influenced the sentiment of people living in the Zambezie region toward the vaccines.As such, Mozambique was listed as a country with expired and at risk to expire vaccines (see Fig. 1).Generally, the study in [64] suggested fear and lack of confidence as contributing factors to vaccine acceptance in Mozambique.
We identified three regions with sentiment hotspots in Namibia.These regions include Omusati for neutral sentiment hotspot, Omaheke for positive sentiment hotspot, and Erongo for negative sentiment hotspot, see Fig. 9(e).The Erongo region is one of the smallest of the 14 regions of Namibia, and is predominately Christian.This may be connected to the high number of negative sentiment in the region as religion has been identified as one of the contributing factors to vaccine hesitancy [65].The result of this is the low turnout of the population of children and adolescents during vaccination against COVID-19.To date, about 3% of the Erongo Region population is said to have received the vaccine [65], [66].Consequently, Namibia is among the countries listed to have expired and at risk to expire vaccines as shown in Fig. 1.
In Nigeria, the positive sentiment hotspots were identified in 13 states with the government capital, including Kaduna, Plateau, Adamawa, Benue, Cross River, and Anambra.Others are Imo, Akwa Ibom, Bayelsa, Edo, Ekiti, Ogun, and Abuja the national capital.The neutral sentiment hotspots were identified in 12 states including Zamfara, Katsina, Kano, Jigawa, Niger, Kwara, Oyo, Kogi, Rivers, Taraba, Gombe, and Maiduguri states.Meanwhile, the negative sentiment hotspots were identified in 12 states, namely Osun, Ondo, Delta, Enugu, Abia, Ebonyi, Nasarawa, Bauchi, Yobe, Sokoto, Kebbi, and Lagos, see Fig. 9(f).Nigeria only has 13.7% of its population that is fully vaccinated [67].This could have led to the high amount of expired vaccines and at risk to expire vaccines in the country as shown in Fig. 1.The combination of high neutral and negative sentiment toward the vaccination may have also contributed to the expired vaccines [68].Intuitively, one may attribute these to the lack of confidence by the citizens about the effectiveness of the vaccines at that time [69].In addition, given the strategic nature of Lagos as the largest city in Nigeria and Africa at large, with about 15.4 million population in 2022 [70], the negative sentiment hotspot identified in the state Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
The following provinces were identified as sentiment hotspots in Rwanda.Kigali (Umujyi wa Kigali), Eastern (Iburasirazuba), and Northern (Amajyaruguru) for the positive sentiment hotspots.The Southern (Amajyepfo) province was identified as neutral sentiment hotspot, while the Western (Iburengerazuba) province was identified as negative sentiment hotspots, see Fig. 9(g).Despite the fact that about 81.0% of Rwanda population is fully vaccinated [44], [67], the Western province showed high negative sentiment dominance toward vaccination.The population of over 2.4 million and 95% Christians religion practiced in the Western province may have contributed to the negative sentiment dominance toward vaccination [72].This is because religion has been said to be one of the contributing factors against vaccination in Africa.Citizens tend to express fear of the aftereffect which leads to lack of confidence in taking the vaccine [73].
Furthermore, in South Africa, we identified three provinces with partially negative sentiment dominance, including Northern Cape, Northwest, and Kwazulu-Natal.In Northern Cape province, the Siyanda and Pixley ka Seme district were identified as negative sentiment hotspots.Similarly, the Mafikeng district of the Northwest province was identified as negative sentiment hotspot.Finally, the Zululand and Umkhanyakude districts were also identified as negative sentiment dominance districts in Kwazulu-Natal province.The remaining six provinces were dominated by neutral and positive sentiments, see Fig. 9(h).Despite that South Africa only achieved 32.5% of its population to be fully vaccinated [67], [74], it was not listed among the countries that have expired and at risk to expire vaccines, see Fig. 1.This may suggest that the negative sentiment dominance combined with the neutral sentiment dominance were not influential enough to negatively impact the vaccination uptake in the country at that time [75].
Finally, in Zimbabwe, we identified Matabeleland South and Bulawayo provinces as positive sentiment hotspots.The Midland, Mashonaland West, Mashonaland Central, Masvingo, Manicaland, and Harare provinces were identified as neutral sentiment hotspots.The Matabeleland North and Matabeleland East provinces were identified as negative sentiment hotspots.The 31.6% of Zimbabwe population is fully vaccinated [67] and does not really change the fact that the country is listed as a country with expired vaccines and at risk to expire vaccines, see Fig. 1.This could be the lack of trust in the government and the uncertainty about vaccine effectiveness and safety as expressed by majority of Zimbabweans in [76].

V. CONCLUSION
Social media is a place where users share their opinions about a subject matter.During the lockdown as a result of the pandemic, social media became the most effective medium of communication for users to express their concerns.As the implementation of vaccination mandate began, social media platform such as Twitter became one of the tools users used to express their opinions about the vaccines and their side effects.These types of opinions generated a lot of missed feelings and concerns.These concerns could lead to a compromise of confidence toward the vaccine which brings about vaccine hesitancy.In Africa, vaccine hesitancy is a major challenge faced by health policymakers in the fight against COVID-19.Given that most tweets are geotagged, clustering them according to their sentiments could help in identifying locations that may likely experience vaccine hesitancy for health policy and planning.
The point-based location technique was used to calculate hotspots by clustering sentiment of these tweets.The green, red, and gray colors were used to visualize the dominance of positive, negative, and neutral sentiments on the map.This means that the location with a green background on the map is a hotspot for positive sentiment.The locations with red and gray background on the map are hotspots for negative and neutral sentiments, respectively.This process of visualizing the sentiment as hotspots on the map was achieved using ArcGIS Online.
Therefore, the main contribution of this research is the application of a point-based location technique in the identification and visualization of vaccine hesitancy hotspots from a labeled tweets sentiments dataset.This suggests that discussions on social media can be analyzed to identify hotspots, based on users' sentiments toward vaccination during a disease outbreak.We believe that this could inform health policy in planning and management of vaccine hesitancy in Africa.

Fig. 2 .
Fig. 2. List of popular keywords used.This list presented for open access and reusability.

Fig. 3 .
Fig. 3. Distribution of collected tweets by country.This shows the number of tweets collected from each country.

Fig. 5 .
Fig. 5. Sample visualization using calculated Geocoordinate.This was used to show how tweet sentiments can be visualized on the map.

Fig. 8 .
Fig. 8. Interactive response of the two factors.Visualizes the interaction between the annotation tools with the machine learning model from the ANOVA analysis.

TABLE I LIST
OF ABBREVIATIONS.THIS TABLE WAS USED FOR EASY READABILITYTABLE II APPROVED VACCINES BY COUNTRY [34].THIS TABLE WAS USED TO IDENTIFY THE POPULAR TOPICS AROUND THE VACCINES FOR DATA COLLECTION authenticate and establish a connection to the Twitter database.We got historical COVID-19 vaccine-related tweets, geotagged according to the nine African countries using Python

TABLE III DATASET
FEATURES.THIS TABLE DESCRIBES THE CONTENT OF THE DATASET USED

TABLE VI DESCRIPTION
OF MACHINE LEARNING HYPERPARAMETERS USED.THESE ARE USER-DEFINED PARAMETERS USED IN THE MODELS.THIS IS USEFUL FOR REUSEABILITY

TABLE VII MODEL
PERFORMANCE OF TWEET SENTIMENT CLASSIFICATION

TABLE VIII LR
MODEL FEATURE INTERPRETATION USING ELI5.SUMMARIZES THE FEATURES RESPONSIBLE FOR THE MODEL PERFORMANCE TABLE IX COMPARISON OF MACHINE LEARNING MODEL PERFORMANCE ON AFINN, VADER, AND TEXTBLOB LABELED DATASET

TABLE X
OF ANOVA TEST RESULT