Sentiment Analysis Using Improved Atom Search Optimizer With a Simulated Annealing and ReLU Based Gated Recurrent Unit

Social media has become an indispensable part of our daily lives in recent times. On social media, users commonly express their thoughts and opinions by sharing a substantial number of reviews and feedback. Twitter is one of the social media platforms with the best growth, and it also serves as a news and business tool. However, large features slow down and lessen the sensitivity of analysis sentiment. It remains difficult to select and classify features in the best possible way. Since feature selection plays a critical role in sentiment analysis. This study proposed the ReLU based Gated Recurrent Unit (ReLU-GRU) for Twitter sentiment analysis to classify the emotions. Covid-19, Sentiment-140 and twitter emoji datasets are exploited to perform the research. Initially, a pre-processing is done through tokenization, stemming, adding part of speech, and punctuation removal. After that, Bag of Words (BoW), Latent Dirichlet Analysis (LDA), Term Frequency, and Inverse Document Frequency (TF-IDF) is applied for extracting the features. Followed by that, classification is carried out by using the proposed Improved Atom Search Optimizer (ASO) and a Simulated Annealing (SA) method. Finally, in the classification stage, ReLU-GRU is proposed for classifying the chosen features into various classes. From the outcomes, it evidently shows that proposed ReLU-GRU has outperformed existing methods by obtaining 97.87% and 96.52% of accuracy on Covid-19 and Sentiment-140 datasets.


I. INTRODUCTION
Recently, people have expressed their thoughts on communication platforms like Skype, Twitter, Instagram, Facebook and LinkedIn [1].These accesses offer suggestions for products as well as instant feedback [2].On Twitter, an online platform that is experiencing rapid expansion, people can send, receive, refresh, and examine brief text messages referred to as tweets [3].The Twitter structure could possibly subtly influence traditional media's goal, especially during major events, as reporters republish beneficial user messages and gather data from posts [4].Mining text data assets The associate editor coordinating the review of this manuscript and approving it for publication was Shuihua Wang .
to obtain subjective data is the responsibility of sentiment analysis [5].Decision support systems [6] as well as individual decision-makers may both benefit from the extraction of organized and educational knowledge from textual data sources through the application of text analytics [7].
Sentiment analysis is the practice of automatically identifying the opinions of writers and categorizing those opinions into three categories: positive, negative, and neutral [8].The purpose of emotion recognition was to classify the emotions present inside the text [9].One of the most researched areas in social media analytics is sentiment analysis, which examines human thoughts, feelings, and emotions [10].This division may be based on the polarity of the communication, such as identifying whether the message is good or bad [11].Even though both lexicon-based and learning-based methods have been used to handle these tasks in the literature [12].Recent study has focused on large methods for deep learning [13].However, building significant highlighted data sets is usually difficult and costly for these algorithms [14].There has been widespread use of pre-trained methods that only need one adjusting step along with a limited set of features [15].
Motivation: Actually, not every feature is required.The majority of the features don't matter what class they belong into it.Conversely, the most relevant feature for the output class is a good feature to use for classification.Unfortunately, large features slow down and lessen the sensitivity of analysis sentiment.It remains difficult to select and classify features in the best possible way.Since feature selection plays a critical role in sentiment analysis, optimization-based feature selection is suggested in this paper.Furthermore, learning algorithms-based classification schemes developed with specific features are suggested.The major problem is to identify the finest classifier to categorize the tweet precisely.Multiclass Support Vector Machines, Long Short-Term Memory, Random Forest, K-Nearest Neighbour and Decision Trees are regularly exploited.Feature selection is important in sentiment analysis, because opinionated text can have high dimensions and negatively impact the sentiment analysis classifier's performance.Therefore, the best features are chosen from the extracted features by means of an Improved Atom Search Optimizer (ASO) with a Simulated Annealing (SA).
Research Gap: Numerous researchers have created a variety of feature selection and classification techniques for sentiment analysis in the previous researches.The distribution of the feature-containing images among the classes is taken into account by the current feature selection methods.They do not, however, account for how frequently the features occur within the classes.As a result, it is observed that a feature that characterizes a class needs to frequently occur in more images within the class than outside of it.The frequently distributed features associated with each class are chosen by the suggested method (IASO-SA).In order to determine the pertinent feature associated with each class, this feature selection method is based on classspecific data.Using ReLU-based GRU classifiers, the suggested feature selection method's classification accuracy is assessed.
Contribution: The main contribution is specified as below, • In this research, twitter sentiment analysis for Covid-19, Sentiment-140 and twitter emoji datasets are analyzed with feature selection and classification methods.
• Followed by the data collection, the pre-processing is done using tokenization, stemming punctuation removal, and POS.After that, the features are extracted using BoW, LDA, and TF-IDF to detect features such as edges, shapes, or motion.
• At that time, the extracted features are progressed for feature selection using Improved Atom Search Optimizer (ASO) with a Simulated Annealing (SA) algorithm which is proposed to select the proper features which is used for improving the classification accuracy.
• Finally, to categorize the selected features and enhance the accuracy, a ReLU based Gated Recurrent Unit (ReLU-GRU) is proposed, where the classification processed to split the data into several classes such as sad, joy, fear, and anger.Paper Organization: The organization of this research is specified as below, the literature survey of the existing papers are defined in section II; the process of the proposed algorithm is explained in section III; the result of the proposed method and its assessments are described in section IV; At last, the conclusion part is specified in section V.

II. LITERATURE REVIEW
The field of study known as sentiment analysis focuses on recognizing and categorizing concepts expressed in text into three polarities: positive, negative, and neutral.Researchers have employed a variety of learning algorithms to categorize sentiment analysis.This section reviewed some of the significant research that determined sentiment analysis.
Semantic Study and Predictor Variables of Web-Scrapped COVID-19 Tweet Corpora through Data Gathering Approaches have been proven by Gourisaria et al. [16].Latent Dirichlet Allocation (LDA) has been used in this project for subject modeling.Additionally, a Bidirectional Long Short-Term Memory (BiLSTM) model and numerous classification methods have been modified for analyzing the polarity of feeling, including the majority voting classifier.The frequency of positive and negative tweets and phrase patterns has been determined using a dual dataset method to improve the outcomes.Even though, this classifier required more computation and was therefore more expensive to implement and train.
Sentimental Investigation of COVID-19 Tweets Employing Learning Models was introduced by Chintalapudi et al. [17].A newly developed method for textual study and effectiveness called Bidirectional Encoder Representations from Transformers (BERT) was used to examine the data.These conclusions compel local administrations to enforce fact-checkers on the internet to counter fake news.These works mostly don't, consequently, go into great detail about the veracity and validity of Twitter.
Dang et al. [18] demonstrated the reliability of several hybrid deep learning techniques on various datasets from various aspects.Examining combined approaches can outperform single models in a variety of fields and dataset kinds was the goal.Only a limited number of samples or different datasets from only one sector have been used for assessing these models.As such, their reliability was not extensively proven.
Han et al. [19] demonstration of a Fisher Kernel (FK) function method that makes use of Probabilistic Latent Semantic Analysis (PLSA) improves the SVM kernel function.The probability and latent relationship between text, terms, and topic which served as the basis for the development of the FL were fully taken into account by the PLSA model.By permitting latent semantic data, this method helps to address the problem of latent semantic features being ignored in the sentiment evaluation of texts.As a result, this method has flaws because it needs to perform an excessive number of calculations in order to correctly resolve the categorization problem.
Ito et al. [20] created a new model known as the contextual sentiment neural network (CSNN), which explains the sentiment analysis prediction process.With the help of its many interpretable layers, the CSNN could clarify its document-level sentiment analysis results in a way like human.Then suggest an innovative learning method termed initialization propagation (IP) training to achieve this accuracy.Even though, the recommended CSNNs were generally less reliable compared to the hierarchical attention network.
Nandi et al. [21] proposed a Fast Fourier Transform on Temporal Intuitionistic Fuzzy Set (FFT-TIFS) for classification in order to improve recall.The FFT-TIFS was utilized to provide programs for a large number of records with good precision and speed.The suggested method achieves 17 times quicker data over the sequential fuzzy C-means technique and at least seven times quicker than distributed FCM technique currently used in the papers, demonstrating a substantial improvement in needed time complexity.In assessing deep learning methods, this FFT-TIFS performed less well.
Bairavel and Krishnamurthy [22] have proposed a novel multimodal sentiment analysis technique involving video, audio, and word.The proposed technique examines the emotions extracted from web records using audio, visual, and linguistic formats.Consequently, the extracted features were effectively chosen using the Novel Oppositional Grass Bee Optimization (OGBEE) approach in order to obtain the best feature set.Additionally, Multilayer Perceptron-based Neural Network (MLP-NN) was used in this suggested method to classify emotions.
As shown by Alarifi et al. [23], a big data method for sentiment analysis utilizing Cat Swarm Optimization-based LSTM Neural Networks and greedy feature selection (CSO-LSTMNN).To use a greedy method that chooses the best features processed by the best classifier, CSO-LSTMNN, effective features are chosen from the cleansed sentiment material.Even though, the system's precision must be increased by reducing text noise using a variety of heuristic techniques while it was being enhanced.
Numerous researchers have created a variety of feature selection and classification techniques for sentiment analysis in the literature.The distribution of the feature-containing images among the classes is taken into account by the current feature selection methods.They do not, however, account for how frequently the features occur within the classes.As a result, it is observed that a feature that characterizes a class needs to frequently occur in more images within the class than outside of it.The frequently distributed features associated with each class are chosen by the suggested method (IASO-SA).In order to determine the pertinent feature associated with each class, this feature selection method is based on class-specific data.Using ReLU-based GRU classifiers, the suggested feature selection method's classification accuracy is assessed.

III. PROPOSED METHOD
Now, the dataset is gathered from Covid-19, Sentiment-140 and twitter emoji dataset.The collected data are then used as input in the pre-processing step, where noise is eliminated using tokenization, stemming, POS, and punctuation removal.Then, using BoW, LDA, TF-IDF, the analyzed information is sent for extraction.Then the classification is done through Improved ASO and SA method.The final step is to use a ReLU based Gated Recurrent Unit (ReLU-GRU) to categorize the selected features to several classes.Figure 1 illustrates the flow diagram of suggested model.

A. DATASET 1) COVID-19 DATASET
This research included information on tweets made by Indian users of the Twitter website while the nation was under COVID-19 lockdown.3090 tweets were included in the data collection, which was taken from github.com(https://github.com/gabrielpreda/CoViD-19-tweets(accessed on 12 January 2021)) & it includes the edited comments on subjects like lockdown, coronavirus, COVID-19, etc.For research, the information of extracted tweets from Indian Twitter platform has been taken into consideration.The texts included discussed COVID-19, the coronavirus, lockdown, etc.

2) SENTIMENT-140
Sentiment140 encompasses the title for having the most comprehensive dataset, consisting of 1.6 million tweets that have been categorized as either favorable or detrimental.It allows you to see what individuals on Twitter are saying on a specific brand, product, or subject.The data is in a CSV file that is devoid of emoticons.Six elements make up the data file format: 1. tweet's polarity (0 = negative, 2 = neutral, and 4 = favorable).2. tweet's identifier 3. tweet's creation date 4. question.This value is NO QUERY if there is no inquiry.5. the person who posted 6. tweet's content.

3) TWITTER EMOJI PREDICTION
This data set contains instances of text emoji and image-emoji relationships found in tweets.In the twenty-first century, expressing emotion through text and emojis has become essential to communication.This creates a Twemoji dataset, or single-label emoji dataset, by utilizing the Kaggle tweets dataset for twitter-emoji-prediction.This dataset is made up of a sizable collection of 70k English tweets that have the label ''related emoji'' attached to them.Additionally, as shown in Table 1, this dataset contains the top 20 emojis, each of which is associated with a mapping set that links it to its corresponding id.Fifty thousand tweets with matching emoji labels make up the CodaLab Twitter Emoji dataset.The emojis are some of the top 20 most widely used emoticons, and the labels range from 0 to 19.The original dataset is written to a raw text file and then a csv file using Python scripts.

B. PREPROCESSING
After the dataset collection, those data are given as input to the pre-processing stage, where different techniques are applied to filter the data.Pre-processing supports in the removal of noise that is frequently present in social media data.The quality of the data determines classification accuracy.In order to obtain meaningful information, data must be pre-processed before being further processed.Sentiment and emotion analysis by computers is challenging due to the high degree of unstructured data collected from social media platforms such as posts, audits, comments, remarks, and criticisms.Since numerous steps that follow pre-processing are significantly affected by the accuracy of the initial data, pre-processing is an essential step and necessary for a dataset's organization [24], [25].The initial stage involves sending the raw text data to a specialized process called preprocessing.

1) TOKENIZATION
Tokenization involves dividing text into discrete words, or tokens.There are various reasons why this step is crucial.First of all, it facilitates the transformation of unstructured text into a more structured format, which eases further analysis.Second, it enables the model to concentrate on specific words, capturing their semantics and connections with one another.It is the process of breaking up large texts into smaller, easier to handle sections.Phrases and words that are unprocessed text are turned into tokens throughout this process.As a consequence, the assessments are tokenized into words.

2) STEMMING
Reducing words to their root or base form is known as stemming.By bringing word variations down to a common base, this normalizes them.For instance, ''running'' and ''ran'' are synonymous with ''run.''This aids in combining related terms and preserving the meaning's core without being influenced by alternative word forms.''

3) PART OF SPEECH (POS)
Adding PoS details can improve the reader's comprehension of the text's context.Understanding a word's grammatical category (noun, verb, adjective, etc.) can help one better understand the organization and meaning of sentences.Given that the choice of words and how they fit into a sentence can have a significant impact on the sentiment expressed, this information can be especially helpful for sentiment analysis.

4) PUNCTUATION REMOVAL
In sentiment analysis tasks, punctuation usually has no bearing on sentiment or meaning.Eliminating punctuation contributes to noise reduction and text data simplification.It makes sure the model concentrates on the words themselves instead of needless symbols that don't convey sentiment.Since punctuation characters are employed to divide text into phrases, parts, and gestures, they must be removed throughout preprocessing.This is due to such symbols affect the results of any text-processing technique, especially when they rely on the number of instances of terms and expressions.
As can be seen from the above techniques, preprocessing is helpful in organizing the data so that highly effective models can be created in accordance with specifications.It is evident from the techniques listed above why they can be so crucial to the natural language processing process.The above-mentioned techniques must be used correctly and in the right order for this process to be useful.When these preprocessing methods are paired with the appropriate feature extraction technique, better results are obtained and are then used as input for the extraction stage.

C. FEATURE EXTRACTION
The feature extraction procedure is carried out using the pre-processed data from the preceding step.Here, features are extracted from pre-processed data using a range of feature extraction techniques, such as BoW, TF-IDF, and LDA [26], [27].The systems indicated above are briefly described in the sections that follow, 1) BAG OF WORDS ''Bag of Words'' (BOW) is the most basic feature extraction technique.It specifies a fixed-length vector of the count, with each entry corresponding to a word in a predetermined lexicon of phrases.Since it is simple to understand and use, this model has shown impressive performance in resolving problems like language modelling and document classification.A word receives a count of 0 if it does not appear in the defined definition; alternatively, its count increases or stays at 1 according to the number of times it appears in the text.This is the reason why the vector's length and the dictionary's word count are always the same.

2) TERM FREQUENCY AND INVERSE DOCUMENT FREQUENCY
The TF-IDF approach is a straightforward and efficient way to match words with texts that are related to the queries.Next, the TF-IDF method is used to return a list of documents that are closely connected to a query.When comparing a text to a set of documents, the TF-IDF metric can be utilized to evaluate the string symbols' relevance or importance.Information retrieval (IR) domains employ it.Where the connection to the document is regularly used, word frequency is computed.Terms that appear frequently in the corpus are examined using inverse document frequency.Eq. ( 1) and ( 2 Following pre-processing the data to build a term vector model on top of the TF-IDF representation, this research divides the dataset into 80% train and 20% test divisions.
Because the dataset employed in this study had no class skewness, accuracy was utilized to assess the sentiment classification findings.

3) LATENT DIRICHLET ANALYSIS (LDA)
After feature extraction, the retrieved tweets are initialized for three categories positive, neutral, and negative using the LDA algorithm [28].Finding topics in a collection of text documents is known as topic modeling, and LDA is commonly used for this task.To determine the word-level variables, each document word is analyzed; the joint distribution process is used to characterize the generating process of LDA.How to select the random variable for the probability density function's -dimensional dirichlet is explained by Equation (3).Furthermore, the computation of the joint distribution for topic mixture and corpus probability is explained by Equations ( 4) and (5).
where, Dirichlet constraints are stated as π and µ;p π and µ; is referred as probability of corpus; corpus generation refers D; M refers document, Number of words refers N, documentlevel variables stated as ℵ; topic assignment of every word refers x, observed word refers y.
In the fields of deep learning and pattern recognition, feature extraction is receiving more and more attention since finding informative features is essential to finding patterns and extracting information.We have analyzed several text feature extraction techniques, transitioning from non-contextualization techniques (BOWs) to context-preserving techniques (LDA/TF-IDF).These extraction techniques yield results that are used as input for the feature selection process.

D. FEATURE SELECTION
The input of selection is taken from the extracted features.Feature selection is important in sentiment analysis, because opinionated text can have high dimensions and negatively impact the sentiment analysis classifier's performance.This research observes the suitability of feature selection techniques and assesses how well they perform in terms of recall, accuracy, and precision for classification.Overall, the length of the features is considered as 101 which carried out for selection process, from that, 75 best features are selected using IASO and SA methods.Therefore, in this research, the best features are selected from the extracted features using an Improved Atom Search Optimizer (ASO) with a Simulated Annealing (SA) that is described as follows,

1) OPTIMIZATION FOR THE ATOMIC SEARCH ALGORITHM
To the best of our knowledge, only few studies has been done on using the ASO for feature selection, which is what motivated this work.This motivates us to work on the improved version of ASO in order to address the issue of feature selection in classification tasks [29].Equation ( 6) describes the force created between the ith and jth at tth iteration.
σ is referred as length of collision diameter; ε is referred as depth of potential; r ij (t) is stated as atom position at t; Equation ( 7) signifies total random interaction forces from remaining atoms in the search space.
is stated as force created between ith and jth at tth iteration.Atomic motion in molecule dynamics is referred as G d i (t) which is expressed in equation (8).
Here, β refers coefficient; x best refers optimal atomic position at t. Therefore, in order to find the atomic mass, equation ( 9) and ( 10) is written as follows: Fit i (t) is the normal fitness value at t, while Fit best (t) and Fit worst (t) refers best and worst fitness values.Acceleration of ith atom at t is quantified in equation (11), The following equations ( 12) and ( 13) are correspondingly the updated equations for atom's speed and exploration position at i + 1.
Equations ( 12) and ( 13) indicate that the standard algorithm's position update coefficient is 1, which also makes it easy for the position update technique to enter an inflexible state and provides less convergence rate.To prevent premature convergence of the algorithm brought on by gathering all the atoms, the algorithm can search the area globally at the beginning of iteration with large steps.

2) IMPROVED ATOMIC SEARCH ALGORITHM
On the other hand, if the rand value is lower, the atom can also move.This suggests that the atom's position is updated in a completely random manner, which could result in poor results.Therefore, the projected method is functioned in two methods that enhance the convergence rate and surpass the drawback that ASO readily succumbs to local optimization.To balance the algorithm's capacity to perform both local and global searches, a Dynamic Sinusoidal Wave Adaptive Weight (DSWAW) is first introduced to the updated equation of atomic position.Adaptive weights refer to parameters in optimization algorithms that dynamically change during the optimization process.DSWAW is introduced to enhance the convergence rate and prevent local optimization.It uses a sinusoidal function to adaptively adjust the atomic position update at different stages of the optimization.This adaptation is designed to facilitate rapid updates at the beginning of the optimization and gradual convergence at later stages a: DYNAMIC SINUSOIDAL WAVE ADAPTIVE WEIGHT It indicates the optimization algorithm's weights in the context of DSWAW are modified in accordance with a dynamic mechanism.The word ''dynamic'' implies that weight adjustments are dynamic and change as the optimization process progresses rather than being static.This flexibility may be useful in avoiding local optima convergence.The shape of a sinusoidal function served as an inspiration for DSWAW which is referred by λ .The necessity that the search location be updated rapidly at the initialization step and the optimal solutions immediately found at later stages can be met with an increase in iterations, which is defined as equation ( 14), t is stated as iterations, T max is referred to as total iterations.The improved atomic position update is stated in equation (15).
3) SIMULATED ANNEALING ALGORITHM SA is a well-known optimization algorithm inspired by the metallurgy annealing process.It involves random solution generation, fitness evaluation, and the exploration of neighboring solutions.The cooling schedule, controlled by parameters like temperature and cooling rate, allows SA to explore the solution space effectively.Thus, it combines with IASO algorithm, which is suitable for feature selection.
The metallurgy annealing process is essentially mathematically represented by the SA algorithm [30].A random solution is stated as X i which is the initial point of SA, those results are utilized to define the neighborhood solution X ′ i .The fitness values of X i and X ′ i are calculated and related.If Even if it does not meet the aforementioned requirement, SA may still choose to use the neighborhood option, aside from the latter arrangement.The probability of p, which is outlined in equation ( 16), determines the situation.
F is stated as the control parameter for fitness, T is represented as temperature.k is defined as a function which is purely a time dependent factor.The SA could not perform well in the case of X i by X ′ i , where p is lesser than a random number (0, 1).On the other hand, in an opposite circumstance, the replacement would take place.SA uses the following equation ( 17) to lower the temperature numbers: The cooling coefficient contains an arbitrary range between [0, 1]) which is represented as µ.Therefore, finest feature is preferred by suggested IASO-SA.

4) FITNESS EVALUATION
The fitness evaluation in IASO-SA is accomplished and designed using equation (18): The objective value is signified as f (X i ) .f (X i ) Pseudocode: Inputs: Data with feature values, Fitness function, multiplier weight, size of population, minimum temperature, boundary settings, dimension of optimization issue, maximum iteration count (t max ), depth weight, cooling rate and initial temperature.
Outputs: Optimal feature subset, best solution X best and its objective function F(X best ).
Randomly initialize X i (solutions) and their velocity V i and set Fit best = ∞. Put Apply cooling schedule: T k+1 = µT k ; Update the best solution X best t = t + 1 end while Return the best solution X best As a result, a new set is created and chosen attributes are assigned numbers; this new set is then used as the objective function.Ultimately, the collection of solutions produces the precise solution, which is then applied to effectively classify the sentiment.In order to predict the most effective features from a group of features, each feature is looked at least once during the feature selection process.To identify sentiments from the gathered dataset, the chosen characteristics are input into the classifiers.

E. CLASSIFICATION
Understanding the context in text data is vital for accurate sentiment analysis.Therefore, ReLU based Gated Recurrent Unit (ReLU-GRU) is exploited here [31].This method is a valid choice for sequential data like text, as mentioned in a previous response.

1) GATED RECURRENT UNIT
GRUs were frequently used to solve sequence-related problems and supposed as variations of recurrent neural networks (RNN).Unlike LSTM, GRU substitutes the update gate and reset gate for the input, forget, output, and input gates of LSTM.Although the prediction accuracy of GRU is not worse than that of LSTM, the faster divergence speed is made possible by the decrease of training parameters.In Figure 2, the architecture of GRU was explained.The reset gate r t automatically manages the combination of previous memory with new input while the update gate Z t determines the volume of previous memory retained at the present step.A higher value of Z t allows for more data from the previous stage to be preserved in the current memory.Conversely, a smaller amount of r t results in the forgetting of more state data from the previous stage.In the initial step, Z t and r t are computed based on the input data x t at the present stage and the hidden layer data h t−1 retained from the earlier time stage.The next step is to determine how much data is saved in the node by using the reset gate.Lastly, the update gate is used to determine the hidden layer output at the current time step.Equation ( 19)-( 22) is a representation of the mathematical procedure used in the GRU.
where, σ determines the functions of sigmoid; W z , W r , W h , U z , U r and U h are weight matrices; b z , b r , b h are the biased results; ĥt is the input state of the sum x t and the h t−1 is the hidden layer in the previous stage h t is the hidden layer output, ⊗ describes Hadamard product.ReLU can be employed as an activation function inside the hidden states or gates of the GRU, albeit how well it works will depend on the particular issue.Sequences of image data make up the input data, which enables the GRU model to identify temporal patterns and dependencies for precise illness prediction over time.By introducing non-linearity, ReLU enables the model to discover more complex relationships in the data.This is important because sentiments are frequently expressed through complex and non-linear word combinations in tasks like sentiment analysis.As a result, the GRU introduced the ReLU process, which is explained in detail in the sections that follow.

2) PROCESS OF RELU BASED GRU
Understanding a sentence's context and word dependencies is often necessary for sentiment analysis.Due to their recurrent nature, GRUs are useful for capturing long-range dependencies and sequential patterns.This is essential for understanding the subtleties of sentiment in natural language.ReLU and GRU improve temporal analysis and feature extraction in classification models.While GRU successfully captures sequential dependencies from the specified features, ReLU enhances network topologies' depth to allow for improved feature representation.The potential for improved classification accuracy exists with this combination strategy.It is made up of several gates that control how information moves through the cell.The essential gates in a GRU cell are the update gate (z) and the reset gate (r), which dictate the retention of the previous hidden state and the integration of new information, respectively: • Update Gate (z): The update gate is calculated using a sigmoid activation function, similar to the standard GRU cell.
• Candidate Hidden State (∼h): In this adapted version, the candidate hidden state is obtained by applying a ReLU activation to the sum of the input and the reset gate.This addition of the ReLU activation introduces non-linearity, enhancing the model's ability to capture complex patterns within the candidate's hidden state.
• Reset Gate (r): In the conventional GRU cell, the reset gate is determined by applying a sigmoid activation function to the sum of both the input and the previous hidden state.
• Hidden State (h): In the standard GRU cell, the updated hidden state is computed by blending the previous hidden state and the candidate hidden state, using the update gate as a weighting factor.The parameters were addressed within the GRU architecture through the incorporation of update and reset gates in the gating mechanism, which takes inputs from the RNN x t , h t−1 and generates ĥt .The ReLu activation function is applied element-wise to the linear transformations in the equations.Here, the GRU with ReLU activation function is illustrated in ( 23)- (26).
Update Gate (Z t ) is mentioned as follows Reset Gate (r t ) is referred as follows: Candidate Hidden state ( ĥt ) with ReLU: Updated Hidden State (h t ) is expressed as follows: ReLU is a common option in deep learning models because of its ease of use and efficiency in processing negative inputs, which helps the model perform better overall and learn more complex patterns.As a result, the restrictions identified in the classification for prediction techniques have been overcome by the ReLU based GRU.

IV. RESULTS AND DISCUSSION
Here, the recommended model is executed through MATLAB 2021 with 64-bit Windows 11, Intel Core i9-4702MQ 2.20 GHz processor and Text Analytics library.Table 2 shows the rating of specification parameters used by IASO, SA, ReLU-GRU methods which are employed in the feature selection and classification stages.The performance of the suggested technique is measured by computing various performance metrics.The performances are as follows, • Accuracy Accuracy is expressed in Equation ( 27) • Precision Precision is expressed in Equation ( 28) • Sensitivity Sensitivity is expressed in Equation ( 29) • Specificity Specificity is expressed in Equation (30).
Table 10 displays the arrangement for several Optimization approaches using Sentiment-140 dataset.A subset of important features is chosen for effective classification using

E. DISCUSSION
In this study, Covid-19, Sentiment-140 and twitter emoji datasets are analyzed using IASO-SA and ReLU-GRU for Twitter sentiment analysis.The suggested ReLU-GRU model is compared to other models that are currently in use to assess performance metrics on the Covid-19 and Sentiment-140 datasets, including Bi-LSTM [16], Fine-tuned BERT [17], Hybrid deep learning [18], and FK-SVM [19].The distribution of the feature-containing images among the classes is taken into account by the current feature selection methods.They do not, however, account for how frequently the features occur within the classes.As a result, it is observed that a feature that characterizes a class needs to frequently occur in more images within the class than outside of it.The frequently distributed features associated with each class are chosen by the suggested method (IASO-SA).In order to determine the pertinent feature associated with each class, this feature selection method is based on class-specific data.The ReLU based Gated Recurrent Unit method is used for classification which classifies the sentiments with high accuracy.The results clearly show that the proposed ReLU-GRU model outperforms the other models, achieving 97.87% accuracy on the Covid-19 dataset and 96.52% accuracy on the Sentiment-140 dataset.The models obtained by the Bi-LSTM [16], Fine-tuned BERT [17], Hybrid deep learning [18], and FK-SVM [19] models were 86.67%, 89%, 84.1%, and 89%, respectively.In the twitter emoji dataset, the proposed IASO-SA achieved a higher accuracy (85.07%), sensitivity (87.15%), specificity (86.85%), precision (85.83%) and MCC (87.77%).Consequently, the suggested ReLU-GRU surpasses the current models and attains higher classification accuracy.

V. CONCLUSION
Here, the evaluation is conducted using the Covid-19, Sentiment-140 and twitter emoji databases.The gathered datasets are first pre-processed and then exploited for extraction by means of BoW, LDA and TF-IDF.The subsequent stage involves Improved ASO and SA techniques to choose optimal features for accurate classification.Lastly, a ReLU based Gated Recurrent Unit (ReLU-GRU) is employed to classify the chosen features into several groups.According to the results, the ReLU-GRU has overcomes remaining models by obtaining accuracy rate of 97.87%, sensitivity of 96.93%, specificity of 94.48%, precision of 95.81% and MCC of 93.86% on Covid-19 dataset and accuracy rate of 96.52%, sensitivity of 97.95%, specificity of 96.29%, precision of 95.78% and MCC of 97.78% Sentiment-140 datasets, respectively.However, the proposed has drawbacks like, could not get more data and has no enough computing power.This work will be further extended in the future by exploring more analysis with large datasets to improve the accuracy.This approach can be used on other social networking sites like LinkedIn, Facebook and so om to know the sentiment of people on any topic.
Data Availability: The datasets generated during and/or analysed during the current study are available in the [Covid 19 datasets] and [Sentiment 140 datasets] repositories.

FIGURE 1 .
FIGURE 1. Flow diagram for suggested model.

TABLE 1 .
Top 20Emojis in the dataset.
W z , W r and W h are weight matrices for the update gate, rest gate and candidate hidden stage.b z , b r and b h are the bias terms for update gate, rest gate and candidate hidden state respectively.W hh and b hh are the weight matrix and bias terms for the hidden state passed through the reset gate.Therefore, for both positive and negative values, the bounded output of symmetric handling is provided by the current activations, such as the default tanh in GRU.ReLU supports to prevent the exponential growth in the computation which are essential to operate the network.
(26) * relu (W hh * +h t−1 ) + b hh(26)where, relu is the ReLU activation function applied elementwise x t is the input at time step 't'.h t−1 is the previous hidden state at time step t − 1.

TABLE 2 .
Specification table for proposed method.

TABLE 3 .
Performance analysis with actual features on Covid-19.

TABLE 4 .
Performance analysis with optimized features on Covid-19.

Table 6
shows the tabulated outputs gained for the various sentiment factors and proves that sentiment factors such as joy, fear, sadness and anger have obtained valuable performance metrics.While considering the accuracy metrics, joy has 97.62%, fear has 96.88%, sad has 97.99% and anger has 98.99%.

TABLE 6 .
Performance analysis of sentiment factors.

Table 7
shows the k-fold analysis of Covid-19 dataset.From the table 7, it clearly shows that 5-fold validation has achieved better results when compared to other k-fold values.

TABLE 8 .
Performance analysis with actual features on Sentiment-140.

TABLE 9 .
Evaluation of optimized features.

TABLE 10 .
Evaluation of optimization approaches on sentiment-140.

Table 11
shows the k-fold analysis of sentiment 140 dataset.From the table 10, it clearly shows that 5-fold validation has achieved better results when compared to other k-fold values.

TABLE 12 .
Performance analysis with actual features on twitter emoji dataset.

TABLE 13 .
Performance analysis with optimized features on twitter emoji dataset.

TABLE 14 .
Evaluation of optimization approaches on twitter emoji dataset.

Table 15
shows the k-fold analysis of twitter emoji dataset.From the table 15, it clearly shows that 5-fold validation has achieved better results when compared to other k-fold values.

TABLE 16 .
Comparison of proposed method with the existing method on Covid-19.

TABLE 17 .
Comparative analysis of proposed method with existing method on Sentiment-140.

TABLE
Comparision of proposed method with existing method on twitter API.