Metaheuristic Ant Lion and Moth Flame Optimization-Based Novel Approach for Automatic Detection of Hate Speech in Online Social Networks

In the online social networks, blogs, microblogs, social bookmarking services and sharing sites, and various web forum pages; the sharing of knowledge, opinions, ideas, etc. are spreading very quickly. This situation brings very dangerous problems in social networks. One of these problems is hate speech detection (HSD) problem which is covering issues such as insults, swearing, humiliation, discrimination, exclusion, detest, abhor, blast, damn, and intolerance. These can be reactions to a person, a group, an organization, an order, or an event. Although few machine learning methods have been used in the literature to solve this important problem in online social media, the performance of the HSD models in terms of many metrics needs to be increased. In this study, an automatic HSD system based on metaheuristic methodology was proposed for better results in this new and important problem. In the proposed optimization approach, Ant Lion Optimization (ALO) algorithm and Moth Flame Optimization (MFO) algorithm were designed for the HSD problem. This is the first attempt to use optimization algorithms as solution search strategies for automatic HSD. An efficient representation scheme and flexible fitness function were designed for this purpose. Many metrics can easily be embedded into the designed fitness function in order to be simultaneously optimized. Firstly, the basic natural language processing (NLP) steps were carried out. Feature extraction was performed using Bag of Words (BoW), Term Frequency (TF), and document vector (Word2Vec). Then, the performances of the proposed novel approaches were analyzed in detail on the three different real-world data. The obtained results were also checked against eight popular supervised machine learning algorithms, Social Spider Optimization (SSO) algorithm, and state-of-the-art Tunicate Swarm Algorithm (TSA). Considering the evaluation criteria for three sets of experiments, it was observed that the accuracy, sensitivity, precision, and f-score results of the ALO and MFO algorithms were superior to machine learning methods. As a result of the experimental studies, the highest accuracy value was 92.1% for ALO, while this value was 90.7% for MFO. Other numerical values obtained in the study were given in the experiments and results section with tables and graphics in detail. Due to the promising results of the proposed approaches, they are anticipated to be used in the solution of many social media and networking problems.

targeting personal, national, ethnic, and religious groups, etc. These shared contents are called hate speech, which is considered a digital crime. It is a new subject of study discovered by computer and data scientists. There are many social media and networking problems in the literature such as sentiment analysis [1], fake news detection [2], rumor detection [3], cyberbullying detection [4], customer satisfaction detection [5], link prediction [6], etc. The most important feature that distinguishes hate speech from other problems is that people who post on social networks think that they use their freedom of expression deliberately or unintentionally. However, content shared with the aim of destroying fundamental rights and freedoms cannot be considered within the scope of freedom of expression due to the fact that it is an abuse of the right and that expression containing hate and other rights and freedoms conflict. This is because attacks on the autonomy and self of the person or group targeted with hate speech will be paved and the discriminatory attitude of the discourse owners by not struggling with this discourse will result in the normalization of the discourse in the long run.
Automatic HSD is necessary to protect the areas, people, or groups that are targeted by hate speech. Therefore, automatic HSD is also important to protect against the abuse of hate speech. Moreover, automatic HSD is an important system to prevent acts that incite violence, discrimination, and hatred based on nationality, race, and religion.
HSD is a new social media analysis problem and limited works are consisting of supervised machine learning algorithms. However, the obtained values of different metrics according to the problem are not at desired level. The design, development, and implementation of new and efficient methods are important task under the philosophy of continuous improvement and always searching for better in many problems such as HSD.
Optimization techniques are used to solve problems by determining the best values for a goal function considering a set of available parameters satisfying the constraint eauality and inequality functions [7]. Metaheuristic techniques are more popular over classical optimization techniques due to simplicity and robustness [8]. Metaheuristic techniques are general purposed methodologies and they can easily and efficiently be used in different fields [9], [10].
To the best of our knowledge, optimization has never been used for solving the HSD problem in the literature. Therefore, this study is a reference resource that seeks to solve the HSD problem with optimization perspective. In addition, with this study, a new direction was developed for the solution of the HSD problem, which is one of the social network problems. The fundamental contributions of this study to the literature and science can be summarized as follows; (1) Metaheuristic optimization algorithms that can solve many complex real-time world problems are used for the first time to automatically solve the HSD problem. (2) Together with this study, a new solution search method for social network problems is proposed.
(3) A new form of representation is proposed for the solution of optimization algorithms on textual documents. (4) A flexible fitness function is designed in this study. Many metrics can easily be embedded in the designed fitness function in order to be simultaneously optimized. (5) With the proposed representation scheme and fitness function, many social network problems seem to be easily and efficiently be solved by the improved metaheuristic algorithms. (6) A new problem field for the optimization algorithms is also introduced by this study. (7) The results obtained are also compared with SSO and state-of-the-art TSA. The remaining of the study is organized as follows: In the second section, the literature about the topic is searched. The most important reason for the shortness of literature is that the number of studies on this topic with metaheuristic optimization approach is very low. The characteristics and working features of the metaheuristic methods used in the study are mentioned in the third section. The algorithms are chosen based on their success in real-world problems. In the fourth section, used datasets and NLP methods with data pre-processing steps are explained. The methods used for feature extraction are explained in this section. Then, the machine learning methods used in the study are listed under the subtitle. Finally, in this section, how the intelligent optimization algorithms are modeled for the HSD problem is stated. Algorithm parameters are shown in this subtitle. Numerical information about the experimental studies conducted is given in the fifth section. The information obtained is shown in detail in tables and graphics. The success and ranking of the algorithms are performed in this section. In the last section of the study, conclusions and recommendations are presented. The strengths and weaknesses of the study are emphasized. Inferences are also made regarding future studies.

II. RELATED STUDIES
Recently, studies on online social networks related to the HSD problem have been increasing in the literature. The HSD, which has become a popular study area, is an issue that requires automatic detection to prevent all people from being harmed. Santosh and Aravind used multi-language data in their HSD study on mixed social media text [11]. They obtained an accuracy of 70.7% on average using the word n-gram method. Rohan et al. used the transformed word embedding model in their HSD study on Twitter. They obtained an accuracy of 92% in this study [12]. Zeerak conducted a study on Twitter about racism and sexism [13]. In his study on 130K tweets, he achieved high performance in the recall, precision, and f-score by the token uni-gram method.
Aymé conducted HSD studies on 6 different datasets to show that HSD is not as easy as think. Researchers using deep learning algorithms obtained an f-score between 23% and 96% in the study [14]. Juan et al. used 6K data in their HSD study on Twitter [15]. The LSTM+MLP neural network methodology proposed by the team achieved an accuracy value of 83%. Shervin and Marcos used the support vector machine classifier on the data consisting of 15K tweets in their HSD study. Using the n-gram method, they obtained an accuracy of 78% [16]. In his Ph.D. thesis, Shanita sought solutions to the HSD problem by using NLP techniques [17]. 91% accuracy value was achieved using the convolutional neural network. Michele et al. used supervised machine learning algorithms such as support vector machines, neural networks, and logistic regression. The obtained f-score on the three different datasets was 80% [18].
David et al. used seven different datasets for HSD on Twitter. They achieved more than 90% accuracy in their study using support vector machines and deep neural networks [19]. William and Julia conducted the HSD study on Yahoo using uni-gram, bi-gram, and tri-gram techniques. As a result of their work, they obtained an accuracy of 94% and recall at 60% [20]. Zeerak and Dirk used the n-gram method for the HSD problem [21]. Using 136K data in total, they achieved an f-score of 74% and a precision value of 72% on Twitter. Valentino et al. conducted the HSD for the Italian language. The team used the support vector machines and worked on a total of 1234K Italian tweets. In addition, they obtained f-score values above 80% [22]. Nina and Els used support vector machine methods in the HSD problem [23]. They reached an f-score of approximately 79% using 10K tweets.
One of the review articles for HSD was prepared by Macavaney et al. [24]. They explained common definitions used for hate speech in his work. The authors defined the datasets used for HSD to cover the year in which the study was conducted. They listed the approaches (wordbased approaches, machine learning approaches, etc.) used for HSD. Finally, they presented a reference paper for future studies by comparing the experimental results of previous studies.
HSD system supporting multi languages was proposed by Aluru et al. [25]. They used deep learning techniques in a total of 9 languages including Arabic, English, German, Indonesian, Italian, Polish, Portuguese, Spanish, and French. The datasets, they used in their work, were obtained from social networks such as Twitter, Facebook, and Stormfront. They used 4 different models: MUSE+CNN-GRU, Translation+BERT, LASER+LR, and mBERT. They achieved the highest performance values in Arabic, English, Indonesian, Italian, and Spanish datasets with their proposed mBERT model.
Sigurbergsson et al. aimed to HSD study on Danish and English datasets [26]. They used Logistic Regression and Bi-LSTM methods on the data collected from Twitter, Facebook, and Reddit social networks. The highest results were obtained with the Bi-LSTM method. In the Danish dataset, they achieved the highest recall value as 70%. In the English dataset, the precision value was 77%. Furthermore, an f-score value of 72% was obtained in the English dataset.
García-Díaz et al. conducted an HSD study on misogyny against women on Spanish tweets [27]. They divided into 3 sub-dataset as violence against women, harassment against women, and misogyny against women. They used Random-Forest, Sequential Minimal Optimization, and LSVM machine learning classifier methods. Among all these methods, they achieved the highest 85.2% accuracy with the Sequential Minimal Optimization algorithm. Word embeddings and NLP techniques were used, which raised awareness of violence against women in social networks.
Mossie and Wang proposed an HSD study on the Amharic language in Ethiopia [28]. They created a dataset as Amharic texts with hate speech data collected from Facebook pages. They completed their studies using GBT, RF, RNN, and RNN+LSTM classifiers. They used TF-IDF, Word2Vec, unigram, and bi-gram feature models in their experiments. They achieved 92% accuracy by using the RNN architecture and Word2Vec embedding feature.
Djuric et al. conducted an HSD study on the data collected from the Yahoo Finance website for 6 months [29]. They carried out their studies by proposing the BoW+TF, BoW+ TF-IDF, and paragraph2vec models. In the study, they conducted using 5-fold cross-validation. They obtained an accuracy of 74% with BoW models. With the paragraph2vec model, they achieved the highest accuracy value on this dataset.
Ghosh-Roy et al. proposed an HSD model using the Perspective API [30]. Ghosh-Roy et al. conducting HSD studies using English, German, and Hindi languages used the deep multi-layer perceptrons method. They obtained the highest 91% accuracy in the English dataset, the highest 82% in the German dataset, and the highest 75% accuracy in the Hindi dataset, respectively.
The remarkable studies on HSD in recent years with the gradual advancement, as well as the current level of the state-of-the-art and the differenet metrics are summarized in Table 1. When the literature is examined, different HSD problems have been solved with very different methods. However, to the best of our knowledge there is not intelligent optimization based model for HSD problems. We adapted intelligent ALO and MFO methods as solution search strategy for HSD problem in order to increase the success of the HSD model in terms of different metrics. It is aimed to bring a new breath to the literature with the proposed methods. In addition, a different perspective is aimed to be presented to the solution of other social network and media problems such as the HSD problem.

III. OVERVIEW OF METAHEURISTIC OPTIMIZATION ALGORITHMS
Metaheuristic optimization algorithms are a sub-branch of artificial intelligence and are inspired by the intelligent behavior of living creatures, ant colonies, insects, bees, and various fish species. There are many metaheuristic optimization algorithms in the literature such as swarm-based, physics-based, hybrid-based, chemistry-based, sports-based,  biology-based, math-based, ecology-based, etc. [53]. In fact, all of these algorithms have a common structure in performing [54]. These algorithms can be summarized as in Fig. 1. ALO and MFO algorithms based on ecology intelligence have recently been reported to work well for different problems [55], [56]. They have been evaluated to give better results for HSD in this paper. The biological structures and mathematical models of ALO and MFO algorithms are explained in the next subsection. SSO algorithm is a powerful algorithm used in recent studies, which is inspired by the intelligent behavior of spiders [57]- [59]. The other state-ofthe-art TSA is bio-based algorithm inspired by the feeding behavior of turnicates [60]- [63].

A. ANT LION OPTIMIZATION ALGORITHM (ALO)
Antlions are one of the insect groups in the Myrmeleontidae family. Their life cycle has two main phases: the larval stage and the adult stage. These creatures are often called ''doodlebugs'' because of the traces left in the sand when searching for a good place to set up traps. ALO was proposed inspiring by antlions hunting behavior. During the hunting process, the antlion absorbs the funnel pits into soft sand and then patiently waits at the bottom of the pit. Prey slides down and is quickly caught by the antlion. If prey tends to escape from the trap, the antlion throws sand at the edge of the trap to slide its victim towards the bottom of the pit. This causes the trap to collapse and the prey comes to the antlion [64].
There are many parameters in the mathematical modeling of ALO, which is developed inspired by these behaviors of ant lions and ants. Among these parameters, the random walks of ants while searching for foods in nature can be defined as in (1): Here cumsum calculates the cumulative total, n represents the maximum number of repetitions, t represents the steps of random walking, and r(t) is the scholastic function is defined in (2): Here t is the step of random walking and rand is a random number generated with a uniform distribution in the interval [0, 1].
The position of the ants in the ALO algorithm is expressed in (3): Here M ant is the position matrix of each ant, A i,j presents the value of ith variable for jth ant. n is the number of the ant, and d presents the number of variables. The fitness function of each ant is stored in the M OA matrix as in (4). In (4), f is the objective function.
Equation (5), M Antlion is the matrix of each ant lion. AL i,j presents the current value of ith variable for jth antlion. n is the number of the antlion, and d presents the number of variables.
Similarly, the M OAL matrix in (6) stores the fitness function of each ant lion: Equation (7) is used to keep the ants walking randomly within the search area: Mathematical modeling of the capture of ants in ant lion's trap pits is given in (8) and (9): Here c t i is the minimum value of all variables in the tth iteration. d t i is the maximum value of all variables in the tth iteration. Finally, Antlion t j refers to the jth chosen position of the ant lion in the tth iteration.
The ant lion's hunting abilities are modeled by the roulette wheel selection. The mathematical model that explains how the trapped ant will slide towards the ant lion is given in (10) and (11) as follows: I ratio is calculated as in (12): Here t is the current iteration, T is the maximum number of iterations, and w is the constant depending on the current iteration and is defined in (13).
In each iteration, the best ant lion is considered the result. This means that each ant walks randomly around the selected antlion and is expressed in (14).
R t A is expressed as the random walking of the ant-lion chosen by the roulette wheel in the tth iteration and R t E is expressed as the best result walking randomly in the tth iteration.
The pseudo-code of ALO is illustrated in Fig. 2. The MFO algorithm is a new method of metaheuristic optimization based on the simulation of specific navigation behavior of moths at night. Moths use a mechanism referred to as ''transverse routing'' for navigation. In this algorithm, a moth flies at a constant angle to the moon. It is a very effective method for long-distance travel on a straight road. Because the moon is far from the moth. That is, it guarantees that the moths fly along the straight line throughout the night. However, everyone usually observes that moths spiral around the lights. Because moths are deceived by artificial lights. Since these lights are extremely close to light sources, maintaining a constant angle to the light source causes moths to follow a spiral flight path [65]. In the mathematical model of the MFO algorithm, the set of moths is represented in an M matrix. The matrix M is shown in (15).
Here n is the number of the moths, and d presents the number of dimensions. For all moths, there is an OM matrix to store fitness values. This matrix is illustrated in (16).
n represents the number of months. The second important component in the algorithm is flames (F). The F matrix can be considered a matrix similar to the moth matrix. The matrix F is defined by (17).
It is also assumed that there is an OF matrix to store fitness values for flames. This matrix is defined in (18).
MFO algorithm is a triple function defined as in (19) to solve optimization problems: Here I is the function that generates a random population of moths and the corresponding fitness values. The mathematical model of this function is as in (20): The P function is the basic function for MFO. This function allows moths to be moved within the search area. This function takes the matrix M and returns the updated new M as in (21): The T function returns true if the termination criteria are met. If not provided, returns false. The T function is defined in (22).
The general structure of the MFO algorithm created by I , P, and T functions can be summarized as in Table 2. For the sake of mathematically simulating the behavior of moths, the position of each moth is updated using (23), taking into account a flame: In (23), M i , F j , and S respectively indicates the ith moth, jth flame, and spiral function. The following restrictions should be taken into account when creating the S function: (1) The starting point of the spiral must start from the moths.  (24): Here D denotes the distance of the jth moths for the ith flame. b is constant to describe the shape of the logarithmic spiral, and t is a random number between [−1, 1]. D is calculated as in (25).
Equation (24) describes the spiral flight path of moths. From this equation, the next position of a moth is defined with respect to a flame. The parameter t in the spiral equation defines how close the next position of the moth must be to the flame.
Equation (26) is used to prevent the MFO algorithm from getting stuck to the local optimum.
In (26), I indicate the current number of iterations. N and T represent the maximum number of moths and the maximum number of iterations, respectively.
The pseudo-code of the MFO algorithm, which is developed from the moths' behavior inspiration, is illustrated in Fig. 3.

IV. USED METHODOLOGIES
In this section, information about datasets, NLP steps, metaheuristic algorithms, and other algorithms was explained.

A. COLLECTION OF DATA
In this study, three different experimental datasets were used. These experimental datasets constituted from realworld problems related to HSD. The data collected from Twitter and online web forums were pre-labeled. General characteristics of the data are listed in Table 3. These datasets that address the most important problems related to the real world. These issues are very important topics that directly target a person, formation, state structure, racism, and sexism. Detailed information about the datasets and the HSD study performed on the datasets were explained in section 5. These datasets with different characteristics were chosen because they were prepared to most suitable for the theme of the HSD problem.

B. STEPS OF NLP AND DATA PRE-PROCESSING
In the pre-processing phase, our goal was to convert textual data to numerical form in order to create a document vector (Word2Vec). These steps are shown in Fig. 4. Before the document vector was created, many pre-processing steps had performed such as punctuation erasure, number filter, stopword filter, case converter, n-char filter, snowball stemmer, row filter, etc. BoW is an NLP method that extracts all word roots in each data. Word roots in each sentence were listed and the next step was taken. After the pre-processing steps was completed, tokenization was performed on a sentence basis. Then the most repetitive terms were calculated according to the Term Frequency (TF) value. Finally, a document vector was created. Thus, the feature extraction process representing datasets was completed. Then HSD problem became a classification problem. After the data are ready, other process steps are started with supervised machine learning and metaheuristic optimization algorithms. The proposed general structure of the HSD problem is demonstrated in Fig. 4.
As shown in Fig. 4, we split our datasets with 70% training and 30% test data. First, NLP steps were applied to de-noise on both training and test data. Then the term frequency value was calculated to extract the features that would represent the data. After the BoW process was completed, our data became a document matrix. Twelve algorithms were used in the classification stage. Four of them (ALO, MFO, SSO, and TSA) were metaheuristic algorithms. They were run in the fitness function designed for the HSD problem. The success of the algorithms was evaluated with the complexity calculation obtained on the test data. The adjustment of the fitness function modeled for the proper process of ALO, MFO, SSO, and TSA algorithms in HSD problem was explained in subtitles D and E in detail.

C. SUPERVISED MACHINE LEARNING ALGORITHMS BASED ON ARTIFICIAL INTELLIGENCE
In order to compare the performance of metaheuristic optimization algorithms, various supervised machine learning algorithms were used for the HSD problem. Eight different supervised machine learning algorithms were used in the study. These algorithms have higher success in classification problems. Moreover, they are the most widely used algorithms in the literature. k-Nearest Neighbor algorithm (KNN), Decision Table (

D. MODELING OF HSD WITH OPTIMIZATION ALGORITHMS
In the modeling stage of the optimization algorithms for the HSD problem, it was necessary to set the initial parameters of the algorithms firstly. The values of the initial parameters of the ALO, MFO, SSO, and TSA algorithms are listed in Table 4. The total number of experiments for each dataset was determined as 10. The maximum number of iterations was 1000 for each experiment. At the end of each experiment, the measurement criteria with the best fitness value were recorded. The final measurement criteria results were obtained by taking the average of the best fitness function results of 10 experiments. The population size for the ALO, MFO, SSO, and TSA algorithms was 30. Each individual was encoded by real numbers between 0 and 1. These reel numbers that form the search agents were updated at the end of the next iteration. The individual with the best fitness was recognized as the best solution for the existing iteration.

E. MODELING OF FITNESS FUCTION
While preparing the fitness function, each of the measurement criteria was designed to have a weight. Measurement criteria such as accuracy, precision, sensitivity, and f-score were used to define the fitness function for the HSD problem. The fitness function proposed for the HSD problem is defined in (27) fitnessn function = a 1 .accuracy + a 2 .precision + a 3 .sensitivity + a 4 .f − score (27) where a 1 , a 2 , a 3 , and a 4 variables were random weights. The sum of these variables must be equal to 1 in the fitness function. One can easily give importance by using the relevant weights for the metrics. Furthermore, metrics or objectives can be easily removed from or integrated into this fitness function. That is why, it seems flexible.
After several test runs, the weight coefficients were assigned as 0. 4, 0.2, 0.2, and 0.2 for the values a 1 , a 2 , a 3 , and a 4 , respectively. While designing the fitness function, our aim was to create the most suitable candidate solution to represent each class in our datasets. The success of the best fit candidate solution obtained was run on test data. This fitness function is flexible for the HSD problem. It was analyzed as a result of the experiments that it obtained good results. Different constraints and parameters can be easily integrated into this fitness function. For example; AUROC, FPR, etc. criteria can be integrated. In addition, the fitness function can be updated by giving a weight in the fitness function in the TP and FP values of the complexity metrics to be calculated. In this sense, it is flexible.
The ALO, MFO, SSO, and TSA algorithms aim to find the most suitable models to represent the HSD problem in all training datasets. After determining the most suitable candidate solutions representing the hate and not hate data in the HSD problem, the success of the developed model on the test data was implemented. Using the Jaccard similarity on the test data, the test data were labeled in which way it was closer to the candidate solution. Jaccard similarity is defined as in (28).
Here, X i = X i1 , X i2 , . . . , X iM represents the ith individual (antlion, moth, spider, or turnicate) in ALO, MFO, SSO, and TSA algorithms. D j = D j1 , D j2 , . . . , D jM represents the jth data in the document vector created after the NLP steps. Equation (29) can be used to summarize the relationship between the X and D vectors.
In another important process step, complexity metrics were calculated for the prediction generated using Jaccard similarity. In order to measure model success, performance measurement criteria (accuracy, precision, sensitivity, and f-score VOLUME 9, 2021 values) were calculated by using complexity metrics. Performance measurement criteria and complexity metrics are summarized in Table 5. In order to solve the HSD problem, properly modeled and adapted metaheuristic optimization methods were run on datasets. In the next section, the experimental results obtained were examined.

V. EXPERIMENTS AND RESULTS
In the experiments, datasets containing three different realworld HSD problems were used. In all experiments, 70% of the dataset was used for training and the remaining 30% was used for testing. All the algorithms were run under equal conditions. Used computer features are as follows: Intel Core i5 10210U processor, 8GB RAM, 256GB SSD, and 4GB MX110 graphics card.

A. RESULTS FOR HATE SPEECH AND OFFENSIVE LANGUAGE
Dataset1 consists of tweets containing about 25K hate speech and offensive language obtained from Twitter. There are three classes in this dataset. These are hate speech, offensive language, and neither [66]. The language of the dataset is English. BoW was applied in the English language dataset.
After the pre-processing steps, 48 features were obtained on this dataset. The dimension variable of ALO, MFO, SSO, and TSA algorithms was set to 48. 30 new candidates were generated in each iteration. Then one of them was accepted as the candidate model with the highest value in the fitness function. Other parameters remained constant. The performance comparison using ALO, MFO, SSO, TSA, and eight different supervised machine learning algorithms is demonstrated in Table 6.
The highest accuracy value (92.1%) was obtained by the ALO algorithm. The highest precision value was achieved by SSO algorithms. ALO was ranked sixth in the precision criteria. ALO algorithm again prevailed the highest sensitivity value (91.9%). TSA acquired the highest f-score values. ALO was able to find a place in the second to last place for the f-score criteria.
The second highest accuracy measurement criteria was yielded by TSA. With an accuracy value of 90.7%, the MFO algorithm ranked third for the first dataset. This algorithm had the third highest precision (88.4%). In the sensitivity criterion, the MFO and TSA acquired the second highest value (90.8%) just behind the ALO algorithm. In the f-score criterion, the MFO algorithm attained the same success as the SMO and DT algorithms with a value of 87.7%. Since the covariance and speed of convergence of the ALO algorithm were high, it successfully surpassed the MFO algorithm. Furthermore, ALO prevailed higher performance because it worked more effectively in terms of the search strategy. The J48 algorithm achieved the worst value for this dataset in almost all performance measurement criteria. DT only obtained the lowest precision. Fig. 5 shows the performance of the algorithms with a bar graph.

B. RESULTS FOR COMMUNITY-DIRECTED PERSONAL INSULT
The dataset2 consists of approximately 4K comments containing personal insults. These comments were collected from online web forum pages. This dataset consists of two classes: insult and not insult [35]. The language of the data collected from public web forum pages is English.
The 27 features for this dataset were obtained after preprocessing and feature extraction. Therefore, the dimension value of ALO, MFO, SSO, and TSA algorithms was set to 27. Other evolutionary parameters of ALO, MFO, SSO, and TSA were not changed.
The performance values of the experimental study using twelve different algorithms are given in Table 7.
ALO algorithm attained the second highest accuracy value for dataset2. It prevailed the highest value with a value of 73.0% in the precision value, which is another performance measurement metric. For the sensitivity value, the DT algorithm outperformed the ALO algorithm, so the third highest sensitivity value was obtained by ALO. TSA prevailed the first place with 73.1% f-score value.
MFO, SSO, and TSA ranked themselves in 3rd place with 72.9% accuracy for dataset2. MFO and TSA; with a precision of 72.3%, ranked fifth behind ALO, DT, SSO, and KNN algorithms. The MFO algorithm reached the sixth and eighth ranked in sensitivity and f-score values, respectively. The algorithm acquired higher sensitivity and f-score values from Ridor, RF, and KNN algorithms. ALO passed MFO for this dataset in performance success.
The highest accuracy and sensitivity metric values were attained by DT algorithm. The best f-score value was acquired by TSA. In addition, DT and ALO algorithms were reached the same f-score value (71%). However, with a slight difference, the DT algorithm prevailed with high values in accuracy and sensitivity values. Also, the RF algorithm was observed to be bad for this dataset. All algorithms and comparison metrics are illustrated in Fig. 6. Accuracy and sensitivity values were appeared to be very close to each other.

C. RESULTS FOR HATE SPEECH AGAINST IMMIGRANTS AND WOMEN
Dataset3 consists of about 13K tweets. This dataset, which includes hate speech against women and migration, consists 110056 VOLUME 9, 2021   of two classes. The main source of these data is Twitter, which is one of the popular online social networks [67]. These tweets would be labeled as hate-full if there was hate towards women and immigrants. Otherwise, they were labeled as not-hate-full.
This dataset consist of English and Spanish tweets. However, BoW was performed in accordance with the English language and the Spanish tweets were excluded.
A total of 40 features was extracted for this dataset3. The dimension variable of ALO, MFO, SSO, and TSA algorithms was set to 40. Other evolutionary parameters of the metaheuristic ALO, MFO, SSO, and TSA algorithms remained stable. The success of the algorithms used in the study is demonstrated in Table 8.
ALO prevailed the leading in performance competition for all measurement criteria. It was seen that ALO was the VOLUME 9, 2021  outperforming algorithm for this dataset with its success. It attained an accuracy of 72.1% with the advantage of the search strategy. It surpassed its competitors in other benchmarks.
The second-best accuracy, and f-score values were attained by the TSA. The second best precision value (70.2%) was acquired by the Ridor algorithm. MFO managed to outpace its competitors. DT was the worst-performing algorithm for this dataset. It was ranked last in all performance criteria. With an accuracy of 64.3%, it was far behind its competitors. ALO and MFO succeeded in leaving many algorithms behind with their success in solving HSD problems. The performance graph of all algorithms is demonstrated in Fig. 7.
As can be seen from the results of all experiments, used metaheuristic optimization algorithms to solve the HSD problem were provided more success than those of classical supervised machine learning algorithms. The results of the experiments showed that metaheuristic algorithms can be used in the solution of many different social network and media problems.
As the number of data increased, the running time of the metaheuristic algorithms applied for the experiments also took time. One of the limit to the operation of algorithms is the excessive number of data. In addition, the unbalanced number of data belonging to each class in the datasets negatively affected the high performance of the algorithms. Better performance would have been achieved if there were datasets with balanced distribution. Furthermore, determining the algorithm parameters a priori and stochastic behaviours seem another limitations. To prove the reliability of the algorithms, each algorithm was run 10 times for all datasets. Measurement criteria were calculated by taking the average of 10 runs.  It is predicted that the improved versions of the applied and proposed state-of-the-art algorithms will produce promising results in solving HSD and other social media and network problems.
The experimental results in this study and the results obtained from the previous studies using the same dataset are compared in Table 9. It was seen that the best result for Dataset1 was obtained by the ALO algorithm. In addition, it was concluded that the highest values were obtained by ALO and MFO in all complexity criteria for Dataset3.

VI. CONCLUSION AND RECOMMENDATION
This section is divided into two sections as conclusions and recommendations. In these subsections, the importance and success of the experimental studies are discussed. The novelty and precious of the study are particularly emphasized. It is anticipated that it will bring a new breath to the literature.

A. CONCLUSION
The HSD problem, which is a major threat in social networks, is the subject of this study. Hate speech or defamatory messages targeting a community or group and quickly posted on social networks should be detected before they reach more users. This study proposes two new optimizationbased approaches to solve the HSD problem in online social networks. ALO and MFO, which are the most recent metaheuristic algorithms, were adapted for the first time in the literature to solve the HSD problem. Eight different supervised machine learning algorithms, SSO, and state-of-the-art TSA were used to compare the performance of the proposed metaheuristic based approaches. For the selected real-world problems related to HSD, the pre-processing phase was completed with NLP methods. Feature extraction was performed by using BoW+TF+Word2Vec methods together. Then twelve different algorithms competed to solve HSD problems. In the experiments, except for one dataset, the highest accuracy, precision, sensitivity, and f-score values are obtained from the adapted ALO algorithm. MFO and TSA algorithms followed the ALO algorithm in the performance race with respect to related metrics. ALO achieved higher performance because it worked more effectively in terms of the search strategy. Since the covariance and speed of convergence of the ALO algorithm were high, it successfully surpassed the MFO algorithm. These two algorithms are successful and promising to solve HSD and seem alternative solution search methods for other social networking problems.
Metaheuristic optimization algorithms, which are capable of efficiently solving many real-world problems, achieved better performance for the HSD problem in terms of many metrics. This study is a reference study to solve HSD problems with the optimization approach. The success of this study shows that the use of this optimization-based approach in other social networking problems will increase. While other supervised machine learning methods show the black-box approach, the optimization-based approach can be adjusted and used according to many problems with its flexible fitness function. Furthermore, the fitness function can be easily adjusted to different datasets for different complex social media problems.

B. RECOMMENDATIONS
In addition to the methods used in this study, more successful results can be obtained by using different approaches. These can be listed as follows; (1) Different similarity measurement methods (cosine, dice, etc.) can be used instead of the Jaccard similarity. (2) In the binary conversion process in representation schemes, different conversion methods can be proposed and used. (3) Using the Pareto approach, the performance of ALO and MFO algorithms can be increased by handling many objectives for the HSD problem. (4) Hybrid or adaptive methods may be proposed to improve the performance of optimization algorithms for this problem. (5) These optimization methods can be integrated into the classical machine learning algorithms for better results in the automatic HSD problem. (6) For the solution of the HSD problem, parameter optimization can be performed using metaheuristic optimization algorithms in deep learning algorithms. (7) Parallel or distributed versions of these methods can be proposed for efficient results in big data with respect to many metrics.