The Influence of Salp Swarm Algorithm-Based Feature Selection on Network Anomaly Intrusion Detection

Network security plays a critical role in our lives because of the threats and attacks to which we are exposed, which are increasing daily; these attacks result in a need to develop various protection methods and techniques. Network intrusion detection systems (NIDSs) are a way to detect malicious network attacks. Many researchers have focused on developing NIDSs based on machine learning (ML) approaches to detect diverse attack variants. ML approaches can automatically discover the essential differences between normal and abnormal data by analysing the features of a large dataset. For this purpose, many features are typically extracted without discrimination, increasing the computational complexity. Then, by applying a feature selection method, a subset of features is selected from the whole feature set with the aim of improving the performance of ML-based detection methods. The salp swarm algorithm (SSA) is a nature-inspired optimization algorithm that has demonstrated efficiency in minimizing the processing challenges faced in performing optimization for feature selection problems. This research investigates the impact of the SSA on improving ML-based network anomaly detection using various ML classifiers, including the extreme gradient boosting (XGBoost) and Naïve Bayes (NB) algorithms. Experiments were conducted on standard datasets for comparison. Specifically, two datasets explicitly focused on network intrusion attacks were used: UNSW-NB15 and NSL-KDD. The experimental results show that the proposed method is more effective in improving the performance of anomaly NIDSs in terms of the F-measure, recall, detection rate, and false alarm rate on both datasets, outperforming state-of-the-art techniques recently proposed in the literature.


I. INTRODUCTION
At present, computer network protection plays an essential role in protecting against both internal and external threats due to various gaps that attackers can exploit to break into and access networks in order to manipulate or steal sensitive information and cause considerable damage [1]. One of the ways to isolate and protect an environment from outside attacks is to use firewalls and traditional rule-based security protection techniques. However, to further increase the level of security protection, an additional system is needed to support traditional security techniques in protecting against The associate editor coordinating the review of this manuscript and approving it for publication was Sedat Akleylek . various types of malicious attacks [2]. Moreover, advanced and sophisticated technologies are needed to examine and analyse enormous amounts of data from network infrastructure transactions. For this purpose, robust network intrusion detection systems (NIDSs) have been developed, which play a crucial role in ensuring network security and require the analysis of complex data [3], [4]. A NIDS protects a computer system or administrators when faced with various threats and attacks. Accordingly, when using a NIDS and its protection ability, one must ensure that it is up to date since there are many gaps in the network detection model to be determined, addressed, and filled [5]. NIDSs work by analysing and extracting the abnormal behaviours to be detected. After these suspicious behaviours are detected, alerts are sent to notify interested parties of any abnormal behaviours that must be considered before sensitive information is accessed or data are manipulated or leaked. Two methods may be used by a NIDS. First, anomaly detection may be used to detect unknown attacks on the network or system, which are new attacks that have not been addressed in the past. Second, signature detection, also known as misuse detection, may be applied to detect abnormal behaviours based on prior knowledge, which must be defined in terms of rules or patterns for knowledge-based detection [3], [6]. Various techniques are applied in anomaly detection, such as machine learning (ML) techniques, to achieve a high detection rate and accurate results. ML-based detection plays an essential role in building effective models that rely on various algorithms and approaches to analyse big data consisting of network traffic flows in order to identify intrusions. One way to develop effective ML-based anomaly detection tools to reach a desired goal is to apply feature selection (FS) techniques [3]. FS is valuable for improving model performance to obtain accurate results. Therefore, one of the essential steps in developing an intelligent analytical tool is to establish data preprocessing procedures for finding relevant features that best reinforce the performance of predictive algorithms [4], [7]. FS plays a meaningful role in maximizing the performance of an ML model by disregarding redundant and irrelevant attributes that degrade the performance of the learning process and increase its complexity. FS can be expressed as a multiobjective optimization, which is NP-hard optimization problem because it concerns several (n) features, necessitating a large search space of size (2 n ) to capture the various permutations of the features [7], [8]. Accordingly, numerous search methods can be utilized to detect the optimal subset of features. The first is to apply a greedy search method to traverse and assess all of the features; however, such a scheme is inherently time consuming. The second is to apply a random search method to randomly explore the domain, which has its own drawbacks and limitations. In particular, there is a chance of search stagnation, which may result in a very high time complexity, among other issues [8].
One way to address the drawbacks and gaps of the FS methods that have been previously proposed by researchers is to adopt the meta-heuristic paradigm. The meta-heuristic methods are global optimization approaches inspired by physical, biological, and animal activities. They can explore the search space both globally and locally when applied to FS problems. One class of meta-heuristic algorithms is based on the swarm intelligence (SI) approach [8]. This approach is inspired by the intelligence exhibited by swarms, herds, schools, or flocks of creatures in nature. SI algorithms have been widely used to address diverse optimization problems and reach suboptimal or optimal solutions. In particular, we investigate the influence of the salp swarm algorithm (SSA), one of the newest SI algorithms, to determine its effectiveness and its ability to address FS problems as an essential preprocessing step to enhance the performance of ML-based anomaly detection models [9]. Our contributions to the network security field are as follows. We propose an efficient network anomaly detection framework consisting of three phases. (i) In the first phase, to achieve a high classification accuracy and increase the detection rate while reducing the false alarm rate without using excessive computational resources. To accomplish these goals, we adopted the SSA-FS method to obtain the most relevant and accurate feature representation. According to our limited knowledge, no previous research has focused on the impact of SSA as an FS method on the network anomaly detection problem. (ii) In the second phase, we use two different classification algorithms, extreme gradient boosting (XGBoost) and Naïve Bayes (NB). Between these two algorithms, the NB algorithm shows weak performance, while the XGBoost algorithm achieves better results, thereby supporting our main goal: to determine the influence of SSA-FS and the extent of its ability to improve the performance of both robust and weak algorithms to support an effective anomaly NIDS. Based on FS, we investigate the effectiveness of the SSA and its impact on these two different algorithms. (iii) Corresponding to the last phase, i.e., model testing, we present an extensive experimental evaluation of the proposed method conducted using two datasets: UNSW-NB15 [10] and NSL-KDD [11]. In addition, the proposed method is compared with several state-of-the-art techniques.
The remainder of this paper is organized as follows: In Section II, we present related works to explore the existing solutions for FS in network anomaly detection. A brief overview of the SSA is provided in Section III. This overview is followed by Section IV, which introduces the design of the presented framework in detail. An evaluation and discussion, including comparisons of the proposed method with state-ofthe-art methods, are given in Section V. Finally, conclusions and future work are presented in Section VI.

II. RELATED WORK
A number of network anomaly detection models, designed with many techniques and methods to improve their detection performance, have been widely researched in many studies. In this paper, we limit our review to approaches using the FS method known as dimensionality reduction as a general preprocessing method for high-dimensional data visualization, modelling, and analysis. In [6], a hybrid FS method comprising three different algorithms (particle swarm optimization (PSO), the ant colony algorithm (ACA), and a genetic algorithm (GA) based on a reduced error pruning tree (REPT) was proposed. Moreover, a two-level classifier (rotation forest and bagging) was used, with a majority vote determining the final decision. The authors of [2] examined the capabilities of XGBoost in NIDSs compared to other ML algorithms,using the information gain (IG) as the basis for FS. In [4], two phases of detection were applied; the first phase was FS using a wrapper method, a support vector machine (SVM), and a GA with multiparent crossover and multiparent mutation (MGA), and VOLUME 9, 2021 the second phase was detection using an artificial neural network (ANN). To achieve more accurate results from the ANN, the authors integrated a hybrid gravitational search and PSO (HGS-PSO-ANN). The work presented in [12] focused on a NIDS utilizing a two-stage classification approach that addressed imbalanced data by filtering the dataset into majority and minority malicious classes. The IG method was used for FS, and random forests (RFs) were used as classifiers on the majority and minority classes individually to produce different models for detecting suspicious attacks.
The authors of [13] focused on applying the Ant Tree Miner (ATM) classification algorithm and the IGbased FS method to detect malicious attacks. ATM builds decision trees (DTs) using ant colony optimization, rather than the traditional C4.5 or Classification and Regression Trees (CART) method. In [14], the focus was on the ability of FS to support efficient network intrusion detection utilizing different ML classifiers. Both wrapper-and filter-based FS methods (IG, gain ratio, symmetrical uncertainty, Relief-F, One-R, and chi-square) were used, and the average was considered. In this method, the evaluators' results were scaled to choose the most relevant features and examine the performance in the subsequent classification stage (J48 and NB) for detecting intrusion attacks. In [15], two different procedures involving misuse detection for known attacks were implemented by applying some generated rules and anomaly detection methods. For anomaly detection, both fuzzy C-means clustering and correlation-based FS were used to divide the training dataset and choose the most relevant features. Moreover, a hybrid classification technique based on the artificial bee colony and artificial fish swarm algorithms was used to detect network intrusion attacks. In [16], a geometric area analysis based on trapezoidal area estimation was proposed for network intrusion detection. In [17], two methods for network intrusion detection were proposed based on misuse and anomaly detection. In the anomaly detection method, three different classifiers (ANN, J48, and NB) were applied, and the IG method was used for FS. In [18], the researchers implemented five classification algorithms and examined their accuracy performance in combination with XGBoost-based FS. They found that the DT algorithm yielded the most accurate results in detecting network attacks. In [19], the researchers applied an association rule mining algorithm as an FS method. They used two classification algorithms (expectation maximization (EM) clustering and NB) to explore the model performance. In [20], a twostage technique was applied for network intrusion detection. In the first stage, an FS method was applied using recursive feature elimination (RFE) and RFs. The second stage included five classification algorithms. Based on the results, the SVM yielded the best performance. The authors of [21] investigated the capabilities of an ANN as a supervised learning algorithm for NIDSs. For the FS procedure, the IG was used to choose the most relevant features to improve the model performance. In [22], the researchers applied a two-stage NIDS. The first stage was an FS method using the attribute ratio (AR), and the second was a classification stage using both K-means clustering and the XGBoost algorithm. In [23], the researchers applied four SI algorithms as FS methods to detect network intrusion attacks: the firefly optimization algorithm (FOA), PSO, a GA, and the Grey Wolf Optimizer (GWO). Moreover, two classifiers (SVM and J48) were used. In [24], to improve the model performance, the researchers used an FS method to choose the most relevant features to increase the detection rate of a NIDS by applying ant colony optimization. For the classification process, the k-nearest neighbours (KNN) method was used. In [25], fuzzy entropy-based ant colony optimization was proposed as an FS method. To test the model performance, four different individual classifiers (random tree, J48, JRip, and RF) were used. In [26], the researchers presented a hybrid algorithm to control and set the optimal parameters of classifiers and choose the relevant features of a dataset using time-varying chaos PSO (TVCPSO). They used the concept of chaos to perform PSO faster while searching for the optimal features and avoiding trapping in local minima.
As the classifiers, they applied multicriteria linear programming (MCLP) and an SVM. In [27], the performance of NIDSs using two FS algorithms, namely, PSO and a GA, was investigated. Additionally, to examine the capabilities of the FS techniques, the authors used four different classification algorithms: rule induction, KNN, NB, and DTs. In [28], a primary NIDS was applied in the network layer to analyse traffic by adopting a cloud network node (CNN). At the same time, an secondary intrusion detection system (IDS) was placed in each tenant virtual machine (TVM), and the authors relied on RFE as an FS method and a chi sequence. During training on the dataset, an RF was used to determine whether the traffic contained malicious attacks. In [29], a significant NIDS algorithm based on anomaly detection with an SVM was used. Two algorithms were applied: binarybased PSO (BPSO) and standard-based PSO (SPSO). The objective of BPSO, as an FS method, was to select the most relevant network features. In contrast, the aim of the SPSO algorithm was to improve the performance of the SVM by adjusting its control parameters. In [30], the authors wished to enhance the detection rate of a NIDS by means of FS. Discretized differential evolution (DDE) was used as the FS method, and C4.5 DTs were used for classification. In [31], the authors proposed a model for anomaly intrusion detection based on the use of the mutation cuckoo fuzzy (MCF) method for selecting the best subset of features and a multiverse optimizer-ANN (MVO-ANN) for classification. The results showed the potential of the proposed model to enhance the intrusion detection efficiency and performance. Table 1 summarizes some of the new and current research on network intrusion detection using various FS methods for comparison with the anomaly-based NIDS using SSA-FS and both the XGBoost and NB classification algorithms that is proposed in this paper.

III. SALP SWARM ALGORITHM
The importance and effectiveness of SI have been proven in many applications and research. In the global optimization framework, SI addresses the problem of identifying and improving the suboptimal solutions for a problem from among a group of alternative solutions, among which is the optimal solution [9]. One of the newest SI algorithms is the SSA, which is a meta-heuristic algorithm introduced by Mirjalili (2017) [9] that is currently being tested and analysed to determine its efficiency and ability to solve many problems. Motivated by the foraging behaviour of salps in deep oceans, it has proven useful in finding global optima for optimization problems. The principle of the SSA is to rely on fitness values (optimal solutions). It depends on studying solutions through their fitness values and chains of search agents to reach suboptimal or optimal solutions. The algorithm works by dividing the swarm into two parts: the leader and a group of follower salps, with the leader placed at the head of the chain and the follower salps moving behind it as a body to form a simple chain [8]. Figure 1 shows an illustration of the SSA. The position of each salp is defined in an n-dimensional search space, where n is the number of problem variables. The positions of all salps are stored in a two-dimensional matrix called x. A food source called F in the search space is the swarm's target. To update the position of the leader, the following equation is introduced: where x 1 j shows the position of the first salp (leader); F j is the position of the food source in the j-th dimension; ub j and lb j are the upper and lower bounds, respectively, of the j-th dimension; and c 1 , c 2 , and c 3 are random numbers in the range of [0,1]. Equation (1) shows that only the position of the leader is updated with regard to the food source.
The coefficient c 1 is the main parameter in the SSA because it balances exploration and exploitation; it is defined as follows: where l is the current iteration and L is the maximum number of iterations. Moreover, the position of each follower salp is updated as follows: where i > 2 and x i j is the position of the i-th follower salp in the j-th dimension. The SSA has been applied in various fields of research to study the effectiveness of this algorithm and its capabilities. The results of these studies show the remarkable superiority and high performance of the SSA; it outperforms the most recent meta-heuristic algorithms. Several stochastic operators are combined in the SSA, allowing it to better avoid local solutions in multimodal search landscapes. This also enables the SSA to perform well on both small and large datasets [32], which is what has made it one of the most effective current algorithms. It is a new and modern algorithm that is suitable for application in many areas and for VOLUME 9, 2021 diverse problems, such as binary optimization, forecasting, the optimization of virtual machine placement, and chemical descriptor selection [8]; hence, it is a rich topic of study. To investigate its capabilities in the field of network anomaly detection, we examine its efficiency and performance in attaining an optimal result and increasing a model's accuracy and rate of detection. The final pseudocode of the SSA is presented in Algorithm 1. F best ← FittestSalp(Population) 6: Update(c 1 ) ; // using Equation (2) 7: for i = 1 to N do 8: if (i == 1) then 9: UpdatePositionLeaderSalp() ; // using Equation (1) 10: else 11: UpdatePositionFollowerSalp() ; // using Equation (3) 12: end if 13: UpdateSalps() ; // utilizing the legal limits 14: end for 15: end while 16: return F best

IV. DESIGN OF A NETWORK ANOMALY DETECTION FRAMEWORK BASED ON THE SALP SWARM ALGORITHM
In this section, we present an overview of the proposed structural model. Next, we explain the FS method based on the SSA along with the preprocessing procedures and the architecture of the FS model.

A. FRAMEWORK FOR NETWORK ANOMALY DETECTION
The proposed framework for network anomaly detection is presented in Figure 2. The structure comprises three different phases. The first phase is dataset preprocessing, including dataset cleaning, transformation, and scaling as well as FS. This phase involves reducing the dimensionality of the feature space and increasing the algorithm speed. In addition, it can eliminate redundant, irrelevant, or noisy data, enhance the data quality, and increase the accuracy of the anomaly detection model. These goals are accomplished by using the SSA as an FS procedure. The second phase is classifier modelling, for which we consider two different classification algorithms: XGBoost and NB. Third, the objective of the model testing phase is to test the ability of the model to achieve accurate results in decision making for anomaly detection by applying various evaluation metrics to obtain the final binary classifier results and decide whether the data correspond to normal behaviour or a malicious attack. Furthermore, we present a set of tests conducted for an overall comparison with various existing techniques. Seven performance measures are applied to evaluate the test results: the accuracy, recall, precision, F-measure, false positive rate, false negative rate, and true negative rate.

B. PREPROCESSING
The workflow of the proposed SSA-based FS method is shown in Figure 3. Step 1, preprocessing, is a significant initial phase that improves the quality of the data to assist in the extraction of meaningful information. Therefore, the preprocessing step is essential for the construction of ML models. Data preprocessing encompasses various procedures, including cleaning, organization, transformation, scaling, and FS.

1) CLEANING AND TRANSFORMATION
Data cleaning, in general, eliminates unnecessary and redundant values, such as duplicate or irrelevant records or features. For instance, in this study, the ID numbers are removed from both datasets. We also remove the columns corresponding to the attack category and label from the UNSW-NB15 dataset [10] and those corresponding to the class and difficulty level from NSL-KDD [11] because they are used as target features. Additionally, we remove null cells from the datasets. Regarding the transformation procedure, the datasets considered in this study include categorical data, namely, variables that identify information as belonging to specific categories. We convert these categorical features into numerical features since most ML models and scalers can most efficiently work with numerical values. For instance, we transform the proto, service, and state features from the UNSW-NB15 dataset and the prototype, service, and flag features from NSL-KDD into numerical values using the LabelEncoder() class from the scikit-learn library.

2) FEATURE SCALING
Input variables frequently have various units; accordingly, they may also have various scales. To avoid any influence from the selection of particular measurement units, we use a feature scaling method to limit the ranges of the variables to allow them to be compared on a common basis. To this end, we apply a min-max scaler to transform the data to within the range of [0,1]. The min-max normalization method is expressed as where X min and X max are the minimum and maximum values, respectively, of the j-th feature. Notably, the XGBoost algorithm does not require any scaling because it requires only the selection of cut-off points for each feature at which to split nodes, and the splitting process is not sensitive to scaling [33].

3) FEATURE SELECTION
FS is a discrete problem, and all solutions are restricted to binary values. Therefore, we must convert the SSA into a binary version with [0,1] values. The meaning of these values is that '1' indicates a selected feature, while '0' indicates a feature that is not selected. Equation (5) is used to map continuous values to their binary versions: Z mn can be regarded as a discrete form of the solution vector X , where X mn represents the continuous position of search agent m in n dimensions. As our next step, we will define a fitness function, and we will use two different classifiers and select between them based on which achieves the greater accuracy. The first is an XGBoost classifier, and the second is an NB classifier. The goal is to assess all search agents' performance and achieve a high (maximal) classification accuracy. The fitness function given in Equation (6) is used in both classifiers for the SSA-FS problem.
This fitness function is calculated based on four essential retrieval parameters: TP, the number of true positives, is the number of malicious connections correctly classified as malicious by the classifier. TN , the number of true negatives, is the number of normal connections that are correctly classified as normal by the classifier. FP, the number of false positives, is the number of normal connections that are misclassified as malicious by the classifier. FN , the number of false negatives, is the number of malicious connections that are misclassified as normal by the classifier. In our architecture, we implement the SSA as a wrapper FS method using two different classifiers, i.e., XGBoost and NB, as the basis of the fitness function. The goal of using two different classification algorithms is to evaluate the quality of each selected subset of features and determine the effectiveness of the SSA using different fitness function classifiers to achieve improvement. The workflow of the proposed SSA-based FS method is presented in Figure 3. As input, the model takes the original dataset. In Step 1, we apply preprocessing, which represents our main contribution to the application of the SSA as an FS method, to choose valuable features. The results are obtained from the SSA based on either the XGBoost classifier or the NB classifier, and the final output is assessed. The SSA is applied after splitting the dataset into training and test sets. In Step 2, we randomly initialize the position of each individual salp. Each of these individuals represents a solution to the FS problem. In Step 3, we perform the fitness calculation by selecting the features corresponding to the solution to be evaluated and removing the non-selected features from the training set. The fitness is calculated as shown in Equation (6). Then, we update the positions of the leader and followers to exploit and explore the search space using Equations (1) and (3); to achieve a balance between exploration and exploitation in the search, we use Equation (2). These steps are then performed iteratively until the maximum number of iterations is reached, at which time the list of the best chosen features is returned to prepare for the attack detection phase.

C. CLASSIFICATION ALGORITHMS 1) EXTREME GRADIENT BOOSTING
XGBoost is a variant of the gradient boosting machine (GBM) model that was introduced by Dr. Chen Tianqi of the University of Washington in 2016, and it has been widely used in academia and industry [33]. It has shown high efficiency and performance in prediction tasks. It has also outperformed many ML algorithms in various competitions; therefore, many authors have researched and tested XGBoost on many datasets to assess its efficiency and its ability to solve problems. Accordingly, it has become a popular research topic in the ML field [2], [34]. One of the advantages of the XGBoost model is that it relies on additive training to progressively optimize the objective function by considering the optimization result from the previous step [1], [5]. The t-th objective function of the model is shown in Equation (7): (7) where l denotes the loss term in the t-th round (the prediction of the i-th instance in the t-th round for f t , which must be added to minimize the presented objective), constant is a constant term, and is the regularization term of the model, which is expressed as In this formula, both γ and λ are custom parameters of XGBoost-the larger these two values are, the easier it is to structure the tree and efficiently avoid overfitting-and x * j represents the optimal weight solution for leaf j [1].

2) NAIVE BAYES
The NB model is classified as a Bayesian probability model. Probabilities are considered in the NB model, with the final result leading to various related evidence variables [35]. An NB classifier aims to provide output that can distinguish between two different classes (i.e., normal (0) and malicious (1)). This goal is accomplished by applying the maximum a posteriori (MAP) function, which is expressed as w?{1,2,3,4,......,N } P Here, C denotes the labelled classes, I corresponds to all observations, w is the class index, P(C|I ) is the class probability for a specific observation, and P(C w ) N j=1 P(I j |C w ) is the product of all conditional probabilities, for which the aim is to obtain the maximum result [22].

V. EXPERIMENTAL SETTINGS
In this section, we present the two datasets and the experimental settings along with the experimental results obtained when applying the SSA as an FS method. The specifications of the computing environment used for the experiments discussed in this paper are as follows: the Microsoft Windows 10 operating system (64-bit), an Intel Core i7 8th Gen processor, Python version 3.6.7 in Google Colab Pro, and 27.4 GB of RAM and GPU memory.

A. ANOMALY DETECTION DATASETS
We examine two publicly available anomaly detection datasets that have been used extensively in previous studies.
The NSL-KDD was presented in 2009 [11]. It is considered an improved version of KDDCUP99, produced by removing all redundant records from the training set and all duplicate records from the test set. It contains a normal class and four attack classes: R2L, U2R, probe, and DoS. We consider the training (KDDTrain) and test (KDDTest) subsets, which consist of 125,973 and 22,544 normal and malicious data instances, respectively. The class distribution is shown in Table 2, along with a breakdown of the traffic record distribution. Typically, almost the half of the records correspond to normal traffic, and the U2R and R2L attacks are shallow. This dataset includes 41 features, as listed in Table 4 [36].
The UNSW-NB15 dataset was developed and released in 2015 by the Australian Center for Cybersecurity (ACCS) [10]. Many datasets have weaknesses in the detection process, which has become a challenge for cybersecurity research groups. Additionally, many researchers in this field have created datasets that are more efficient and accurate in discovering and detecting particular types of attacks, especially at the network level. In contrast, UNSW-NB15 is a more comprehensive resource for evaluating existing network intrusion detection methods more reliably. It contains a normal class and nine types of attacks: worms, shellcode, reconnaissance, generic, exploits, DoS, backdoors, analysis, and fuzzers. We consider the training (UNSW-NB15train) and test (UNSW-NB15test) subsets, which consist of 175,341 and 82,332 instances, respectively. The class distribution is shown in Table 3, along with a breakdown of the traffic record distribution. Typically, almost the half of the records correspond to normal traffic, and the rest correspond to malicious attacks with different prevalences. The traffic record features represent information about the traffic that can be provided as input to a NIDS. This dataset includes 42 features, as listed in Table 5 [10].

B. EXPERIMENTAL RESULTS AND DISCUSSION
Our experiments were designed to examine the performance of the proposed network anomaly detection method using the XGBoost and NB algorithms as ML techniques. We employed the whole feature space of both datasets, UNSW-NB15 and NSL-KDD, for the binary classification problem. To estimate the performance of the proposed model in network anomaly detection and examine the effectiveness and impact of the SSA as an FS method, the algorithm was run ten times to obtain the average result. Recall that our aims in developing the proposed model were to reduce detection errors and increase the model accuracy. The evaluation metrics used for experimental comparison with other detection methods were the false negative rate (FNR), true negative rate (TNR), false positive rate (FPR), precision, F-measure, recall and accuracy; these metrics are calculated based on four essential retrieval parameters, namely, the numbers of true negatives (TN ), false positives (FP), true positives (TP), and false negatives (FN ).
1) True positive rate, or recall: The percentage of malicious actions correctly classified as attacks.
2) FNR: The rate of incorrect classification such that a malicious attack is classified as normal.
FN TP + FN (11) 3) TNR: The percentage of normal traffic instances that are correctly classified as belonging to the normal class.
4) FPR: The rate of incorrect classification such that normal traffic is classified as a malicious attack.
7) Accuracy: The degree to which the obtained measurement result corresponds to the correct value; it indicates how close a measured value is to a standard or known value.
TP + TN TP + TN + FP + FN (16) We applied the features chosen through FS to enhance the model performance. We assessed the results and the effectiveness before and after applying the SSA as an FS method. Table 6 shows the accuracy results of the XGBoost and NB algorithms on the two datasets before the application of the SSA. Moreover, when we applied SSA-based FS, we noted that this algorithm has many essential parameters that have a great impact on the model efficiency. After an initial test study, suitable values were empirically selected based on both datasets for the following parameters: the number of search agents, the number of iterations, and the lower bound (lb) and upper bound (ub). Specifically, we found that twenty search agents, a maximum of 50 iterations, an lb of −5 and a ub of 5 led to satisfactory results, whereas other sets of values required much longer run times and/or yielded worse results.
The many parameters of the SSA that were tested are reported in Table 7, and the effective parameter values selected based on these tests to obtain the final experimental results for both algorithms (SSA-XGBoost and SSA-NB) are listed in Table 8 Table 10. These features were selected in accordance with their importance as evaluated by the objective function based on each classifier, taking advantage of the ability of the SSA to explore and exploit the    search space to find the combinations of features that could lead to the most accurate results, thus improving the model performance. Table 9 shows the improvements achieved via SSA-FS when the SSA-NB and SSA-XGBoost algorithms were utilized on the two different datasets. In addition, these experiments show that our proposed anomaly-based NIDS approach improves the classification performance with an acceptable time cost. On the NSL-KDD dataset, SSA-XGBoost consumed 1.77 seconds for model training and 0.03 seconds for testing, whereas SSA-NB required a training time of 0.02 seconds and a testing time of 0.008 seconds.  Similarly, for the UNSW-NB15 dataset, SSA-XGBoost required 4.28 seconds for training and 0.057 seconds for testing, and the corresponding time costs for SSA-NB were 0.050 seconds and 0.01 seconds, respectively. Overall, from our examination of the proposed SSA-based FS method with different algorithms and datasets, we conclude that enhanced performance can indeed be achieved. While some features are highly effective for model training, the inclusion of noisy, redundant, and irrelevant feature dimensions can render learning algorithms very slow and even degrade the performance on learning tasks; therefore, the ability to select the most relevant features from the datasets is beneficial for increasing the detection rate and decreasing the false alarm rate of a NIDS. In particular, one of the critical factors that can affect model performance is data imbalance. In general, the characteristic representations of malicious attack behaviours in datasets used for intrusion detection are usually strongly imbalanced since certain attacks tend to occur more often than others. Therefore, these datasets must be balanced by means of appropriate preprocessing methods before ML algorithms are applied to enable the development of an effective NIDS [12], [37].

C. COMPARISON AND DISCUSSION
To further evaluate the results of the proposed method in comparison with similar well-established methods, we consider recent research in which FS methods have been applied in experiments and use the same datasets. In this way, we can achieve sufficiently fair comparisons to illustrate the impact of our proposed method and compare it with previous studies to prove its effectiveness in detecting network attacks. Table 11 shows that for the UNSW-N15 dataset, the proposed method (SSA-XGBoost and SSA-NB) outperforms the other techniques considered for comparison and achieves better results, especially in terms of the FNR and FPR. Note that the results are worse when the FNR is higher, as this means that the model cannot detect or recognize malicious attacks; therefore, the higher values seen for the other models compare unfavourably with the FNR of 03.8% achieved by SSA-NB. Similarly, the FPR represents the rate of false alarms, where normal traffic is detected as malicious, confusing the system administrators. According to the results obtained, SSA-XGBoost outperforms the other methods in terms of this metric, achieving the lowest false alarm rate of 6.9%. Regarding the F-measure and recall, as shown in Table 11, our method again noticeably outperforms the methods presented in previous studies; SSA-XGBoost reaches an F-measure of 94.4%, and the SSA-NB algorithm achieves the highest recall of 99.0%. In contrast, the previous work on SVM-PSO [23] exhibits the highest results in terms of the TNR 97.4% and precision 96.3%; however, this method yields lower results in terms of the accuracy, F-measure, and recall. Moreover, this model shows a high FPR 25.9% and the worst result in terms of the FNR 20.4%, meaning that it can only reliably identify normal traffic and cannot detect malicious attacks effectively and reasonably. Additionally, our model shows some advantages in reducing the number of selected features while achieving increased detection performance. Table 12 presents the results obtained on the NSL-KDD dataset. Here again, the authors of other existing approaches, including those of [4], [6], [22], [26], [30] and [29], have used the same datasets but applied different FS techniques. Our proposed method with SSA-XGBoost yields an accuracy of 99.0%, which is substantially higher than the accuracy values reported in previous studies. Similarly, when we consider the other evaluation metrics, our model generally outperforms the others. However, the SVM-SPSO-BPSO technique [29] performs as well as or better than our approach in terms of various metrics, although the differences are not large considering the efforts expended by these previous researchers to obtain their results. They used SPSO to control the parameters of the SVM to improve the classifier model while also performing a second round of processing utilizing BPSO as an FS method. Even with all of these improvements adopted in [29], our model was able to achieve competitive performance with only the application of SSA-based FS, and it outperformed the methods presented in other previous studies.
Moreover, we must consider the ability of our proposed to outperform the other methods in terms of the FNR and recall in particular. In terms of these two metrics, it reaches values of 00.6% and 99.1%, respectively, indicating the highest detection rate and an enhanced detection ability. Tables 11 and 12 show that the proposed approach is highly competitive and effective for solving network anomaly detection problems. Through this performance analysis, we have demonstrated that the proposed model exhibits better performance and a greater ability to achieve increased accuracy while reducing the false alarm rate than state-of-theart techniques.

VI. CONCLUSION AND FUTURE WORK
Network anomaly detection has been recognized as a robust tool for protecting networks from various types of malicious attacks. This paper has proposed a useful network detection framework by examining two different algorithms, XGBoost and NB, as a basis for applying the SSA as an FS technique. The detection rate can be affected by the features considered during data preprocessing and model training, which are recognized as two fundamental determinants of the final detection capability and efficiency; therefore, in the proposed method, the SSA is applied to select the most relevant and optimal features to improve the model performance. Accordingly, two classification algorithms, SSA-XGBoost and SSA-NB, were used to build a robust network anomaly detection model. We used two primary datasets, namely, NSL-KDD and UNSW-NB15, to examine the model performance. Our proposed network detection model achieved high and robust performance, with a high accuracy, a high detection rate, and a low false alarm rate, demonstrating the effectiveness of applying the SSA for FS. Moreover, our method showed either absolute superiority or highly competitive performance compared to other recently proposed techniques for addressing network anomaly detection problems. In future work, we anticipate further improving the SSA to increase the detection rate by using different methodologies for different datasets while overcoming the challenges presented by data imbalance to enhance the model performance. Furthermore, we will explore further comparisons with other techniques, such as feature extraction methods.