DEA-RNN: A Hybrid Deep Learning Approach for Cyberbullying Detection in Twitter Social Media Platform

Cyberbullying (CB) has become increasingly prevalent in social media platforms. With the popularity and widespread use of social media by individuals of all ages, it is vital to make social media platforms safer from cyberbullying. This paper presents a hybrid deep learning model, called DEA-RNN, to detect CB on Twitter social media network. The proposed DEA-RNN model combines Elman type Recurrent Neural Networks (RNN) with an optimized Dolphin Echolocation Algorithm (DEA) for fine-tuning the Elman RNN’s parameters and reducing training time. We evaluated DEA-RNN thoroughly utilizing a dataset of 10000 tweets and compared its performance to those of state-of-the-art algorithms such as Bi-directional long short term memory (Bi-LSTM), RNN, SVM, Multinomial Naive Bayes (MNB), Random Forests (RF). The experimental results show that DEA-RNN was found to be superior in all the scenarios. It outperformed the considered existing approaches in detecting CB on Twitter platform. DEA-RNN was more efficient in scenario 3, where it has achieved an average of 90.45% accuracy, 89.52% precision, 88.98% recall, 89.25% F1-score, and 90.94% specificity.


I. INTRODUCTION
Social media networks such as Facebook, Twitter, Flickr, and Instagram have become the preferred online platforms for interaction and socialization among people of all ages. While these platforms enable people to communicate and interact in previously unthinkable ways, they have also led to malevolent activities such as cyber-bullying. Cyberbullying is a type of psychological abuse with a significant impact on society. Cyber-bullying events have been increasing mostly among young people spending most of their time navigating between different social media platforms. Particularly, social media networks such as Twitter and Facebook are prone to CB because of their popularity and the anonymity that the Internet The associate editor coordinating the review of this manuscript and approving it for publication was Kathiravan Srinivasan . provides to abusers. In India, for example, 14 percent of all harassment occurs on Facebook and Twitter, with 37 percent of these incidents involving youngsters [1]. Moreover, cyberbullying might lead to serious mental issues and adverse mental health effects. Most suicides are due to the anxiety, depression, stress, and social and emotional difficulties from cyber-bullying events [2]- [4]. This motivates the need for an approach to identify cyberbullying in social media messages (e.g., posts, tweets, and comments).
In this article, we mainly focus on the problem of cyberbullying detection on the Twitter platform. As cyberbullying is becoming a prevalent problem in Twitter, the detection of cyberbullying events from tweets and provisioning preventive measures are the primary tasks in battling cyberbullying threats [5]. Therefore, there is a greater need to increase the research on social networks-based CB in order to get greater VOLUME 10, 2022 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ insights and aid in the development of effective tools and approaches to effectively combat cyberbullying problem [6].
Manually monitoring and controlling cyberbullying on Twitter platform is virtually impossible [7]. Furthermore, mining social media messages for cyberbullying detection is quite difficult. For example, Twitter messages are often brief, full of slang, and may include emojis, and gifs, which makes it impossible to deduce individuals' intentions and meanings purely from social media messages. Moreover, bullying can be difficult to detect if the bully uses strategies like sarcasm or passive-aggressiveness to conceal it. Despite the challenges that social media messages bring, cyberbullying detection on social media is an open and active research topic. Cyberbullying detection within the Twitter platform has largely been pursued through tweet classification and to a certain extent with topic modeling approaches. Text classification based on supervised machine learning (ML) models are commonly used for classifying tweets into bullying and non-bullying tweets [8]- [17]. Deep learning (DL) based classifiers have also been used for classifying tweets into bullying and non-bullying tweets [7], [18]- [22]. Supervised classifiers have low performance in case the class labels are unchangeable and are not relevant to the new events [23]. Also, it may be suitable only for a pre-determined collection of events, but it cannot successfully handle tweets that change on the fly. Topic modeling approaches have long been utilized as the medium to extract the vital topics from a set of data to form the patterns or classes in the complete dataset. Although the concept is similar, the general unsupervised topic models cannot be efficient for short texts, and hence specialized unsupervised short text topic models were employed [24]. These models effectively identify the trending topics from tweets and extract them for further processing. These models help in leveraging the bidirectional processing to extract meaningful topics. However, these unsupervised models require extensive training to obtain sufficient prior knowledge, which is not adequate in all cases [25]. Considering these limitations, an efficient tweet classification approach must be developed to bridge the gap between the classifier and the topic model so that the adaptability is significantly proficient.
In this article, we propose a hybrid deep learning-based approach, called DEA-RNN, which automatically detects bullying from tweets. The DEA-RNN approach combines Elman type Recurrent Neural Networks (RNN) with an improved Dolphin Echolocation Algorithm (DEA) for finetuning the Elman RNN's parameters. DEA-RNN can handle the dynamic nature of short texts and can cope with the topic models for the effective extraction of trending topics. DEA-RNN outperformed the considered existing approaches in detecting cyberbullying on the Twitter platform in all scenarios and with various evaluation metrics. The contributions of this article can be summarized as the following: • Develop an improved optimization model of DEA for use to automatically tune the RNN parameters to enhance the performance; • Propose DEA-RNN by combining the Elman type RNN and the improved DEA for optimal classification of tweets; • A new Twitter dataset is collected based on cyberbullying keywords for evaluating the performance of DEA-RNN and the existing methods; and • The efficiency of DEA-RNN in recognizing and classifying cyberbullying tweets is assessed using Twitter datasets. The thorough experimental results reveal that DEA-RNN outperforms other competing models in terms of recall, precision, accuracy, F1 score, and specificity. The rest of this article is structured as the following: Recent related works are reviewed and analyzed in Section II. The proposed DEA-RNN model is described in Section III. Section IV discusses the experimental analysis, performance metrics, and results analysis. The discussion is introduced in Section V. Finally, Section VI offers the conclusion and possible future directions.

II. RELATED WORKS
This section is mainly focused on reviewing state-of-theart of CB detection and classification on Twitter datasets. Machine learning (ML) based approaches with different feature selection methods are widely used in cyberbullying tweet classification. Purnamasari et al. [26] utilized the SVM and Information Gain(IG) based feature selection method for detecting cyberbullying events in tweets. Muneer and Fati [11] used various classifiers, namely AdaBoost(ADB), Light Gradient Boosting Machine (LGBM), SVM, RF, Stochastic Gradient Descent (SGD), Logistic Regression (LR), and MNB, and for cyberbullying events identification in tweets. This study extracted features using Word2Vec and TF-IDF methods. Dalvi et al. [12] [27] used SVM and Random Forests (RF) models with TF-IDF for feature extraction for detecting cyberbullying in tweets. Although SVM in these models achieved high performance, the model complexity increases when the class labels are increased. Al-garadi et al. [28] investigated cyberbullying identification using different ML classifiers such as RF, Naïve Bayes (NB), and SVM based on various extracted features from Twitter such as (tweet content, activity, network, and user). Huang et al. [29] suggested an approach for identifying CB from social media, which integrated the social media features and textual content features. The features are ranked using IG method. Well-known classifies such as NB, J48, and Bagging and Dagging are utilized. The findings implied that social characteristics could aid in increasing the accuracy of cyberbullying detection. Squicciarini et al. [30] utilized a decision tree (C4.5) classifier with a social network, personal and textual features to identify Cyberbullying and cyberbullying prediction on social networks like spring.me, and MySpace. Balakrishnan et al. [31] utilized different ML algorithms such as RF, NB, and J48 to detect cyberbullying events from tweets and classify tweets to different cyberbullying classes such as aggressors, spammer, bully, and normal. The study concluded that the emotional feature does not impact the detection rate. Despite its efficiency, this model is limited to a small dataset with fewer class labels. Alam et al. [32] proposed an ensemble-based classification approach using the single and double ensemble-based voting model. These ensemble-based voting models utilized decision tree, LR, and Bagging ensemble model classifiers for the classification while utilizing mutual information bigrams and unigram TF-IDF as feature extraction models. On analysis over the Twitter dataset, the Bagging ensemble model provided the best precision but considered other parameters. Although, these ensemble models reduced the training and execution time for classification, the major limitation comes when utilized sarcasm tweets and multiple-meaning acronym terms. Chia et al. [8] also utilized different ML and feature engineering-based approaches to classify irony and sarcasm from cyber-bullying tweets. In this approach, many classifiers and feature selection methods were tested; while this approach greatly detects the sarcasm and irony terms among cyber-bullying tweets, the detection rate is still very low [33].
Similarly, Rafiq et al. [17] utilized decision tree, AdaBoost, NB, and Randon Forest classifier to identify the instances of cyberbullying in a Vine dataset. Authors collected the Vine media dataset and labeled it using Crowd-Sourced and CrowdFlower websites. They utilized the comments, unigrams, media information, and profile as the features. Nahar et al. [34] suggested a semi-supervised learning method for detecting CB in social media in which training data samples are augmented, and a fuzzy SVM method is applied. The augmented training approach expands and extracts the training set from the unclassified streaming text automatically. The learning is performed using a small limited training set given as an initial input. The suggested method overcomes the dynamic and complex character of streaming data. Xu et al. [35] provided many off-the-shelf methods, including LDA and LSA-based modeling and Bagof-Words models for predicting bullying traces on Twitter. A personalized cyberbullying detection framework, namely PI-Bully, was introduced by Cheng et al. [36] to detect cyberbullying from the Twitter dataset. PI-Bully composes three elements: a global element that determines the characteristics that all users have in common, a personalized element that captures the distinctive features of each user, and a peer influence element capable of quantifying the various influences of other users people.
Deep learning (DL) based approaches for cyberbullying detection in tweets have also been proposed in the literature. N. Yuvaraj et al. [9] used Artificial Neural Network (ANN) and Deep Reinforcement Learning (DRL) to classify cyberbullying tweets. However, this approach has higher computational complexity. Chen et al. [37] used a text classification model based on CNN and 2-D TF-IDF features to enhance the sentiment analysis task performance. The experimental results showed that the CNN model obtained optimal results compared to the baselines LR and SVM models. Agrawal [16] utilized LSTM with Transfer Learning for cyberbullying detection on several social media networks. A new representation learning approach named smSDA (Semantic-Enhanced Marginalized Denoising Autoencoder) was suggested by Zhao et al. [38] for detecting cyberbullying. smSDA produced discriminative and robust representations. Following that, The numerical representations that have been learned can be input into SVM. Zhang [39] Suggested a new model which integrates the Gated Recurrent unit Network GRU layers and CNN layers to detect hate speech. Al-Hassan and Al-Dossari [19] utilized SVM as the baseline classifier and compared it against four DL models, namely CNN + LTSM, LTSM, CNN + GRU, and GRU to detect cyberbullying hate speech in Arabic tweets. However, the CNN+LSTM and CNN+GRU complexity is higher and might not be effective in handling larger datasets. Natarajan Yuvaraj et al. [18] proposed a new classification model for CB detection from Twitter data. It used deep decision-tree classification with multi-feature based AI for tweet classification. The deep decision tree classifier has been designed by integrating the hidden layers of deep neural networks with the decision tree classifier. This approach also utilized three feature selection approaches: Chi-Square, Pearson Correlation, and IG. However, it cannot handle high-dimensional data with such accuracy. Fang et al. [20] designed a classification model that combines a self-Attention mechanism and bi-directional Gated Recurrent Unit (Bi-GRU) to detect cyberbullying in tweets. This model employed merit for learning the underlying relationships between words using BI-GRU and used it together with a self-attention mechanism to improve the cyberbullying tweets classification process. However, the context-independent behavior of the attention network creates limitations in learning all relationships between the tweets.
Pericherla and Ilavarasan [33] suggested a transformer network-based word embedding model to classify CB tweets. This model utilizes Light Gradient Boosting Machine to classify the tweets and RoBERTa to create word embedding. This approach overcomes the context-independent limitations of traditional word embedding methods. Yet, this model has a higher training time compared to the CNN models. Paul and Saha [40] proposed a model for identifying cyberbullying, namely CyberBERT, based on the BERT. Iwendi et al. [21] introduced a model to detect cyberbullying based on Bi-LSTM and RNN. This model showed that the RNN could achieve high performance, but still, the Bi-LSTM has significantly high efficiency. In some cases, CNN also performs better. Akhter et al. [41] performed many DL models such as LSTM, CLSTM, CNN, and BLSTM, and other ML models to discover an abusive language from Urdu social media text. Some other studies utilized CNN's to enhance the cyberbullying detection [42]- [46]. Tripathy et al. [47] proposed a fine-tuning approach for detecting CB based on ALBER. Agarwal et al. [7] utilized RNN based on Under-Sampling and Class Weighting. These modifications helped the RNN model to perform better than the LSTM model. This indicates that tuning the parameters VOLUME 10, 2022 can enhance the RNN performance. Pitsilis et al. [48] proposed hate-speech detection utilizing RNN and the word frequency vectors. Edo-Osagie et al. [49] developed Attention-based RNN for short text classification and achieved high accuracy. However, the location filtering in this method is limited. Khodabakhsh et al. [50] presented future personal life events predictions from tweets using the RNN model. However, this model does not classify the highly class-imbalanced data effectively. Kumar and Sachdeva [51] proposed a hybrid approach to detect CB in social media. This approach integrates the capsule network (CapsNet) and Bi-GRU encoder, namely (Bi-GAC). Cheng et al. [52] suggested an approach, namely HANCD (Hierarchical Attention Network for Cyberbullying Detection). The proposed approach utilized the context to detect the relative significance of the specific comments and words by applying the levels of attention techniques. Besides, it forecasts the time interval that elapses between two neighboring comments. Eronen et al. [53] suggested an approach for detecting cyberbullying based on the linguistically backed pre-processing and Feature Density (FD) approach. The authors investigated the effectiveness of FD utilizing linguistically-backed preprocessing such as stop words filtering, Parts of Speech (POS), Named Entity Recognition (NER), etc., approaches for assessing classification performance and the complexity of the dataset. On the other side, some recent studies presented multi-models to detect CB in 3 various modalities of social data networking, namely visual and info-graphic and textual such as [51], [54], [55]. Kumari et al. [56] presented DL based model to classify various levels of cyber aggression over networking social media comments in a bilingual.
From the above-detailed review of the literature related to CB detection and classification, some important issues have been observed. Firstly, the deep learning classifiers have better classification efficiency than the machine learning models because of their superiority in terms of accuracy when it gets trained with a large dataset. Secondly, RNN has better advantages of fast processing with the abstract feature learning process, thus making RNN as one of the most efficient classification models. However, the limitations of the RNN model are also highlighted, such as low accuracy due to pre-mature convergence, and limited tuning of RNN parameters have a significant impact on the overall classification performance. This indicates that tuning the parameters can enhance the RNN performance. Therefore, in this paper, the DEA-RNN model is presented to enhance the performance of RNN by considering the aforesaid issues and limitations of existing ML and DL methods.

III. METHODOLOGY
The overall DEA-RNN model is shown in Fig. 1. The model includes the following phases: (i) data collection, (ii) data annotation, (iii) pre-processing and data cleansing, (iv) feature extraction and feature selection, and (v) classification. In the following subsection, each of these components are highlighted.

A. DATA COLLECTION
The input dataset is made up of tweets collected through Twitter API streaming with the help of around 32 cyber-bullying keywords. Idiot, ni**er, LGBTQ (le***an, g*y, bisexual, transgender, and queer), whore, pussy, faggot, shit, sucker, slut, donkey, live, afraid, moron, poser, rape, fuck, fucking, ugly, bitch, ass, whale, etc. are some of the keywords as recommended in psychology literature [30], [36], [57]. Whereas the other keywords such as ban, kill, die, evil, hate, attack, terrorist, threat, racism, black, Muslim, Islam, and Islamic were suggested in [39]. The initial dataset includes 435764 with racism, insult, swear, and sexism words based keywords contributing about 130000 tweets. Tweets in this dataset include many outliers. Only the English language tweets are needed, and hence the tweets containing other language terms are removed, and retweets are filtered, as shown in Fig. 1. After removing these types of irrelevant tweets, about 10000 tweets are randomly selected from the remaining tweets to form the finalized dataset. All these processes are done as a part of the pre-processing stage automatically. Then the other primary pre-processing operations are performed as in section III-C.

B. DATA ANNOTATION
This section mainly concentrates on annotating and labeling the selected tweets from the original Twitter dataset. After selecting 10000 tweets randomly from the collecting tweets, the selected tweets were labeled manually into two labels, either ''0'' non-cyber bullying or ''1'' cyberbullying, by a set of three human annotators over a period of one and half months. In the labeling procedure, the human annotators labeled the instances based on whether it was considered to involve cyberbullying and also the guidelines described in detail in [57]. The making decision of the cyberbullying instances depends on the following guidelines: character attacks, insults, competence attacks, malediction, verbal abuse, teasing, name-calling, mockery, threats, and physical appearance. Initially, each tweet was classified by two annotators, and the level of agreement rate between the two annotators was 91% approximately at this phase. Then, a third annotator was tasked with resolving the discrepancies discovered during the initial annotation process. Finally, we obtained the final dataset after resolving discrepancies and cleaned up the data, which contained 10000 labeled tweets, among which 6,508 (0.65%) are non-cyberbullying, and 3492 (0.35%) are cyberbullying tweets. By observing the number of cyberbullying and non-cyberbullying tweets, the labeled Twitter dataset is imbalanced. The number of tweets in classes is greatly variable. As a result, balancing approaches such as oversampling or under-sampling is employed to resolve the issue. Here, Synthetic Minority Oversampling Technique (SMOTE) has been utilized to oversample the minority class (cyberbullying Tweets) due to the class imbalance problem between cyberbullying and noncyberbullying. The oversampling process is performed by replicating cyberbullying samples many times to balance the dataset as used in [15], [16]. Hence, the total number of tweets  after oversampling was 13,016 samples. Table 1 shows the original dataset and the dataset with oversampling.

C. PRE-PROCESSING AND DATA CLEANSING
The data cleansing and pre-processing phase contain three sub-phases [58]. This process is performed on the raw tweet dataset to form the finalized data as described in the previous dataset. In the first sub-phase, noise removal such as URL removal, hashtag/mentions removal, punctuation/symbol removal, and emoticon transformation processes are performed. In the second sub-phase, Out of Vocabulary Cleansing such as spell checking, acronym expansion, slang modification, elongated (repeated Characters removal) are performed. In the final sub-phase, tweet transformations such as lower-case conversion, stemming, word segmentation (tokenization), and stop word filtering are conducted. These subphases are performed to enhance the tweets and improve feature extraction and classification accuracy. Figure 2 shows the pre-processing and data cleansing steps.

D. FEATURE EXTRACTION AND SELECTION
The features from the Twitter dataset are extracted using NLP tools such as Word2Vec and TF-IDF, with the nouns, pronouns, and adjectives are considered as primary feature contents, whereas the adverbs and verbs provide additional information. Furthermore, the extraction of Part-of-Speech (POS) tags, function words, and content word features can improve the classification performance [59]. There are so many Feature selection methods as mentioned in [60]. For identifying the cyber-bullying events, prominent feature are selected utilizing the Information Gain (GI) method, then these features subsets are fed into DEA-RNN classifier.
E. DEA-RNN CLASSIFIER MODEL 1) IMPROVED DEA DEA mimics the behaviors and the capability of dolphins to generate a kind of echo (click sounds) during the hunting process [61]. Initially, the dolphin's population is initialized, and the search space alternatives for each feature are ordered in ascending or descending order. For variable j, feature vectors A j with the length LA j is constructed, which includes all potential alternatives for the jth variable. These vectors are then placed adjacent to each other and creating VOLUME 10, 2022 The predefined probability is referred to PP, the CF of the 1 st loop is denoted by PP 1 , the current loop number is referred to Loop i , Loopsnumber indicates the number of the loops that the algorithm considers for converging. The curve degree is denoted by Power.
The fitness of each location is calculated using the error rate equation with a threshold value of 0.57. The Accumulative Fitness AF (A+k)j is then calculated based on the rules of dolphin for j-th variable, and i-th location and k = −R e to R e .
where, AF (A+k)j denotes to the Accumulative Fitness of the (A + k)th alternative to be selected for the jth variable, the fitness in location i is denoted by Fitness (i), R e denotes the effective radius where its fitness affects the accumulative fitness of alternative A's neighbors and the radius should be no more than a quarter of the search space. Eq. (2) and (3) are modified to tweak the performance adaptability to the RNN. The Coeff (k) is altered from a bi-linear coefficient function into a non-linear function as in Eq. (4), enabling the Dolphins to move in any direction within the search space of features. The non-linear nature coefficient function allows the matching of features with less iteration and also enhances the exploration process.
Using the modifying Coeff (k) as in Eq. (4), the AF (A+k)j Accumulative fitness as presented in Eq. (2) in DEA is altered and identified as in Eq. (5).
A small value of ε should be appended to the matrices in order to distribute the possibilities much fairly in the search space, as AF = AF + ε. This value has to be selected based on the way of defining the fitness function. Then, the optimal position of the current loop is detected and set AF = 0. For the variable j (j=1 to NV ) , the probability (P ij ) of the selecting alternative i (i=1 to ALj) is computed as shown in Eq. (6).
where Alj is the number of alternatives. Finally, the alternatives selected for all the variables with the best locations are specified with probability equal to PP as in the following formula: P ij = PP, whereas the remaining of probability is specified with other alternatives as given in Eq. (7).
This kind of probability can assist in identifying the following step locations, and lastly, the optimal global location is chosen. According to the algorithm's mapping, this position is the highest-rank configuration of RNN. By using the DEA, the training time of RNN can be reduced. As RNN is the widely utilized tool for classification, the slow speed of convergence limitation is primarily considered a problem that can be resolved using parameter optimization.

2) DEA-RNN WITH PARAMETER OPTIMIZATION
In the proposed DEA-RNN, the weight and biases along with the size of the population are considered as the parameters to be optimized. The weight and the corresponding bias for the Elman RNN are computed using the weight matrices [62] as expressed in Eqs. (8) and (9), respectively.
Here W n denotes the N-th weight value of the weight matrix (n = 1, 2, . . . , N ) and B n denotes the bias value for the network. α and β are two constant parameters with the condition α and β < 1, while rand is a random number between (0,1). The RNN process is a sum of square errors arranged for each weight matrix in where, W C is a total weights list matrix for the network. Therefore, the average sum of square errors is used as the fitness function. For the proposed DEA-RNN, the Elman RNN structure is formed with three layers:-the input layer, the hidden layer, and the output layer. Every layer has an individual index variable, i.e., i for input nodes, j and l for hidden nodes, and k output nodes. As Elman RNN has a feed-forward network structure, the input vector x is transmitted through the weight layer. The input layer vector function of the RNN is given as Here, the number of inputs is denoted by n, the j-th bias value of the weight matrix is represented by B n(j) and the input layer vector function is denoted by net j (t).
Similarly, in RNN, the input vector is propagated through the weight layer with an addition of the previous hidden activation y l (t − 1) through another recurrent weight layer U n and formulated as in Eq. (11). The output function of the hidden layer y j (t) is expressed as in Eq. (12). (12) Here, the number of hidden nodes is denoted by m, f () indicates to the Network activation function of hidden layer and y j (t) = f net j (t) denotes the output function of the hidden layer and calculated as the hidden-activation function of the input vector. The output of the whole network is obtained at the end of the output layer, which is identified based on the hidden layer and group of output weights W .
Here, the output function for the output layer is identicated by net k (t), g() denotes to the network activation function for the output layer, Y k (t) = g (net k (t)) is a predicted output function and W n(kj) denotes the n weights of k-th output node and j-th hidden layer nodes. The error associated with the output layer is utilized to determine the sum of the square errors. Hence, the error at the output layer is computed as given in Eq. (15).
where T k is actual output, and Y k is the predicted output. The performance index of the RNN is calculated as in Eq. (16).
Computing the average sum of square is based on the performance index and calculated as in Eq. (17), Here P i indicates the number of dolphin populations in the i-th iteration. The performance index is denoted by V F (x), VOLUME 10, 2022 and the average of performance is denoted by V µ (x). At the end of each iteration in DEA, the average Sum of Square Errors (SSE) of ith iteration is computed as given in Eq (18).
DEA uses the Minimum Sum of square Error(MSE) as the best dolphin, and the mapped configuration (weights, bias and size of population) is chosen as the best RNN structure. MSE is calculated as in a given Eq. (19).
Here NL denotes the number of locations, Y i andŶ i are the observed values and predicted values of the i-th location dolphin. Based on the chosen dolphin, the obtained optimal weight and bias are retrieved, and the weights and bias of all the layers will be updated with a small variation . Therefore the updated weights and bias are given as Here h denote the current layer of the DEA-RNN. Using this process, the RNN can be tuned effectively and applied for cyber-bullying tweets classification. Algorithm 1 presents the pseudo-code for DEA-RNN.

IV. EXPERIMENTAL ANALYSIS
In this section, the evaluation of DEA-RNN is performed over datasets crawled from Twitter utilizing these metrics: recall, precision, F-measure, accuracy, and specificity. The input dataset and the data annotation are described in sections III-A and III-B. Two baseline cyberbullying models based on deep learning, namely Bi-LSTM [21], RNN [21], and three baseline cyberbullying models based on machine learning models, namely, SVM [26], Multinomial Naive Bayes (MNB) [11], and R [11] are used for the comparison with the proposed DEA-RNN model. These models have been selected from state-of-the-art cyberbullying detection in social media. The same setup parameters configurations of the considered baseline models in the original papers are used. However, Python 3.7.4 and Pycharm IDE 2020.2.3 were used for the experiments. In the implementation and the experiments configurations, some required libraries were used, such as Keras, TensorFlow, NumPy, NLTK, Scikitlearn, Tweepy, etc. The experimental evaluations are carried out on a personal system with configurations, Intel Core-i5 CPU, Windows 10 and 8 GigaByte RAM. The preprocessing steps are performed as proposed in [58] using the NLTK Python package. The input dataset is divided into training and testing datasets. For the evaluation, it is also classified into three different scenarios 60:40% (Scenario 1), 70:30% (Scenario 2), and 90:10% (Scenario 3). The evaluation metrics are chosen to display the best performance of the tweet classification of each method. Each implemented method is run N = 20 times to obtain an average value of each evaluation metric, as well as 5-fold cross-validation is adopted.

A. EVALUATION METRICS
This sub-section briefly highlights the evaluation metrics utilized in this study to evaluate the efficiency of DEA-RNN. The evaluation process is performed based on the following metrics: accuracy, recall, precision, F-measure, specificity and computing training time. However, each method is run (N = 20) times for all experiments to obtain an average of obtained results for each evaluation metric. These performance metrics are described in Table 2.

B. EXPERIMENTAL RESULTS
This sub-section discusses the obtained experimental results of DEA-RNN classifier in comparison with some considered baseline deep learning models, namely Bi-LSTM, RNN, and other baseline machine learning models, namely MNB, RF, and SVM. The prediction results of cyberbullying are validated based on various input dataset scenarios 60:40% (Scenario 1), 70:30% (Scenario 2), and 90:10% (Scenario 3). The performance evaluation is carried out in terms of the aforesaid metrics. The experiments were executed M = 20 times for each classifier over each dataset input scenario. Then, the average of the performance metrics is computed using equations as described in Table 2. The overall performance comparison results on various classifiers over different dataset input scenarios are illustrated in Table 3.

1) AVERAGE ACCURACY
The proposed DEA-RNN model is evaluated in terms of accuracy compared to the considered existing models by computing the average accuracy for all scenarios. As shown in Figure 3, the DEA-RNN model has obtained the highest average accuracy of 90.45% in scenarios 3, while other methods such as Bi-LSTM, RNN, SVM, MNB, and RF have got 88.74%, 87.15%, 85.21%, 82.26%, and 83.45%, respectively. It is observed that the performance of deep learning models (Bi-LSTM and RNN) is better than machine learning models (SVM, RF, and MNB). The MNB model shows the worst performance among all the models. Similarly, DEA-RNN achieved 87.14%, with scenario 2, which is the best accuracy value compared to accuracy results 83.45%, 80.26%, 77.10%, 64.45%, and 75.14% obtained by other existing Bi-LSTM, RNN, SVM, MNB, and RF models respectively. Also, in scenario 1, the proposed model achieved the optimum results of 82.25%, outperforming the considered existing models for the evaluation process. Bi-LSTM has got the second score among all the other models, whereas MNB has got the worst performance results. It can be concluded that the performance of the proposed model and other methods in Scenario 3 has optimal results than other scenarios in terms of accuracy, as illustrated in Fig. 3. Load the training data 4.
Pass the DEA locations as weights to the network 6.
Feed-forward network runs using the weights initialized with DEA 7.
Minimize the error using adjusting network parameter by utilizing DEA 10.
Eliminate a fraction of the worst solutions. 12.
Find new solutions to replace the old ones. 13.
Assess the fitness function to select the best configuration of RNN 14.
Replace old location (i) with the new location (i + 1) 16.
End for 18.
DEA estimates weight and bias at each iteration Until the network is converged 19.
Update weights and bias utilizing Eq.   In scenarios 2 and 3, Bi-LSTM has got the second precision score among all the other models, whereas MNB has got the worst performance results. From Fig.4, it can be clearly observed that the performance with (scenario 3) has optimal results than other scenarios in terms of precision metric.

3) AVERAGE RECALL
The average recall of the proposed model with the compared methods is plotted in Fig 5. It can be observed that from the plot when the input dataset is scenario 3, DEA-RNN scored 88.98 %, which is the highest result among all scenarios. Besides, it is the highest result in scenario 3 compared VOLUME 10, 2022 to the recall of the considered existing models Bi-LSTM, RNN, SVM, MNB, and RF, which have obtained 87.52%, 85.9%, 82.72%, 78.89%, and 82.49%, respectively. Likewise, DEA-RNN achieved 87.11%, with scenario 2, which is the best recall value compared to recall results 82.78%, 79.77%, 77.14%, 69.87%, and 77.08% obtained by other existing Bi-LSTM, RNN, SVM, MNB, and RF models respectively. Also, in scenario 1, the suggested model got the optimum results of 76.33% outperforming the current models considered for the evaluation process. In contrast, the MNB classifier has obtained 63.01% over scenario 1, which is the lowest result. Finally, it is observed that the performance of deep learning models such as Bi-LSTM and RNN is better than machine learning models (SVM, RF, and MNB). Fig. 6 shows the performance of the algorithms in terms of the average F-Measure (left) and the average specificity (right). DEA-RNN has got 89.25% average F-measure when the input dataset is scenario 3 (90:10 %), which is the highest result among all dataset input scenarios. While the   dataset scenario is 70:30%, DEA-RNN obtained 87.08%, which is the best performance compared with Bi-LSTM, RNN, SVM, MNB, and RF models. In contrast, the MNB classifier obtained 65.49% when the input splitting dataset is scenario 1, which is the lowest result. In according to the specificity, the proposed model has achieved 90.94% of specificity in scenario 3, which is the best result compared with Bi-LSTM, RNN, SVM, MNB, and RF models. We can conclude that the specificity of DEA-RNN with scenario 3 has got the optimum result among all the results of all metrics over all the scenarios, as shown in Fig. 6 (right).

5) PERFORMANCE EVALUATION IN TERMS OF TRAINING TIME
The Training time of the proposed model was compared with baseline models. Where, scenario 2 has been taken into consideration for computing the training time. It can be observed that the proposed DEA-RNN model has less training time compared to other deep learning Bi-LSTM, RNN baseline models. The training time of Bi-LSTM is more than the proposed model, RNN as well as the machine learning models, but the achievement of Bi-LSTM is better than the other baseline models and less the proposed model. DEA-RNN has consumed 248.52 seconds in training time, whereas the baseline models based on deep learning Bi-LSTM, RNN have consumed 349.1, 274.31seconds, respectively. SVM consumed training time more than MNB and RF, But the performance of SVM model in detecting cyberbullying is more efficient than MNB and RF. We can conclude that, the other baseline models based on machine learning, such as MNB and RF have less training time than other existing models based on deep learning including the proposed model. Figure 7 shows the Performance Improvement Rate (PIR) of the proposed DEA-RNN model compared with the considered current deep learning and machine learning models. The details of performance improvement has provided in section V.
In summary, we observe that all performance metrics (i.e, specificity, f-measure, precision, recall, and accuracy) generate the highest performance with scenario 3 than other scenarios. Also, DEA-RNN has achieved the best results for cyberbullying tweet classification in terms of all evaluation metrics on all three scenarios. In addition, DEA-RNN has attained the average of all scenarios 86.61% accuracy, 85.94% precision, 84.14% recall, 85.54% F1-score, and 86.96% specificity values which are higher than the considered state-of-the-art models. Therefore, this approach can be suggested as an effective approach for detecting CB in the Twitter. The effective solutions were attained in this model, which can be attributed to the use of DEA for the weight and bias optimization and the excellent reduction of training time. Besides, this signifies the impact of DEA on the performance of RNN. This also ensures that the proposed DEA-RNN can be highly adaptable for modern specific short text topic models.

V. DISCUSSION
Performance Improvement Rate (PIR) shows the Improvement of the suggested model in terms of the following metrics: specificity, f-measure, precision, recall, and accuracy. The total PIR is determined by comparing the overall performance of the proposed model with the other existing models, two deep learning and three machine learning models considered for the evaluation process. The improvement rates of the proposed model in terms of accuracy in Scenario 2 are 3.69%, 6.91%, 10.04%, 12%, and 22.69% compared with baseline models Bi-LSTM [21], RNN [21], SVM [26], RF [11], and MNB [11], respectively. Similarly, the PIR of accuracy in Scenario 3 are 1.71%, 3.3%, %,5.24%, 7%, and 8.19% compared with Bi-LSTM, RNN, SVM, RF, and MNB. In according to precision, the improvement rates of the proposed model in Scenario 2 are 4.14 %, 6.93%, 10.42%, 8.06%, and 11.24%, compared with baseline models Bi-LSTM, RNN, SVM, RF, and MNB, respectively. Likewise, the performance improvement rate of precision in Scenario 3 is 1.62%, 2.9%, 5.27%, 5.65%, and 9.51 compared with Bi-LSTM, RNN, SVM, RF, and MNB. The improvement rates of the proposed model in terms of recall in Scenario 2 are 4.33%, 7.34%, 10.03%, 17.24%, 6.48%, and 10.09% compared with Bi-LSTM, RNN, SVM, RF, and MNB, respectively. Similarly, the performance improvement rate of accuracy in Scenario 3 are 1.46%, 3.08%, 6.26%, 6.48%, and 10.09% compared with Bi-LSTM, RNN, SVM, RF, and MNB. In according to F-Measure, the improvement rates of the proposed model in Scenario 2 are 5.74%, 6.72%, 10.21%, 9.06%, and 14.32% compared with Bi-LSTM, RNN, SVM, RF, and MNB, respectively. Likewise, the performance improvement rate of precision in Scenario 3 are 1.54%, 2.99%, 5.77 %, 6.08%, and 9.8%, compared with Bi-LSTM, RNN, SVM, RF, and MNB. Figure 7 shows the performance improvement rate of the proposed model compared to existing models. In brief, the overall average performance improvement rate (PIR) gained by the developed model reached 2.42%, 3.822 compared to the deep learning models Bi-LSTM and RNN, respectively. Likely, the overall average PIR obtained by the developed model reached 6.65%, 7.55%, and 12.12% compared to the Machine learning models SVM, RF, and MNB, respectively. Therefore, the overall improvement rates of the proposed model proves that the proposed hybrid DEA-RNN model can be suggested as an effective approach for detecting cyberbullying in the Twitter dataset. Also, DEA-RNN has achieved the best results for cyberbullying tweet classification in terms of all evaluation metrics on all three scenarios. The effective solutions were attained in this model, which can be attributed to the use of DEA for the weight and bias optimization and the excellent reduction of training time. Besides, this signifies the impact of DEA on the performance of RNN. This also ensures that the proposed DEA-RNN can be highly adaptable for modern specific short text topic models.

VI. CONCLUSION AND FUTURE WORK
This paper developed an efficient tweet classification model to enhance the effectiveness of topic models for the detection of cyber-bullying events. DEA-RNN was developed by combining both the DEA optimization and the Elman type RNN for efficient parameter tuning. Furthermore, it was tested in comparison with the existing Bi-LSTM, RNN, SVM, RF, and MNB methods on a newly created Twitter dataset, which was extracted using CB keywords. The experimental analysis showed that the DEA-RNN had achieved optimal results compared to the other existing methods in all the scenarios with various metrics such as accuracy, recall, F-measure, precision, and specificity. This signifies the impact of DEA on the performance of RNN. Although the hybrid proposed model obtained higher performance rates than the other considered existing models, the feature compatibility of DEA-RNN reduces when the input data is increased greater than the initial input. The current study was limited only to the Twitter dataset exclusively; other Social Media Platforms (SMP) such as Instagram, Flickr, YouTube, Facebook, etc., should be investigated in order to detect the trend of cyberbullying. Then, the possibility of utilizing multiple source data for cyber-bullying detection will be investigated in the future. Furthermore, we performed the analysis only on the content of tweets; we could not perform the analysis in relation to the users' behavior. This will be in future works. The proposed model works to detect cyberbullying utilizing textual content of tweets, whereas the other type of media such as images, video, and audio is still an open research area and future research directions. Besides, we aim to classify and detect CB tweets in a real-time stream.