Leveraging Metaheuristics for Feature Selection With Machine Learning Classification for Malicious Packet Detection in Computer Networks

Robust Intrusion Detection Systems (IDS) are increasingly necessary in the age of big data due to the growing volume, velocity, and variety of data generated by modern networks. Metaheuristic algorithms offer a promising approach to enhance IDS performance in terms of optimal feature selection. Combining these algorithms along with Machine learning (ML) for the creation of an IDS makes it possible to improve detection accuracy, reduce false positives and negatives, and enhance the efficiency of network monitoring. Our study proposes using metaheuristic algorithms along with machine learning classifiers for feature selection to optimize the number of features from the data set of computer network traffic. We have tested several combinations of algorithms viz., Genetic Algorithm (GA), Particle Swarm Optimization (PSO) and Grey Wolf Optimizer (GWO) along with ML algorithms viz., Decision Tree (DT), Random Forest (RF), Gaussian Naïve Bayes (GNB) and Logistic Regression (LR). The combinations of algorithms have been tested over the NSS-KDD and kddcupdata_10% data sets. We have drawn several insights on feature selection scores with respect to test scores, FI scores, recall and precision for various algorithm combinations. The feature selection time has also been highlighted to showcase the fastest-performing algorithm combinations. Ultimately, we have presented three combinations of algorithms depending on organizational IDS requirements and provided separate solutions for each.


I. INTRODUCTION
The internet today has rapidly changed from what it had initially started.Even with increased attention to protecting electronic information, there are ample reasons for business organizations, institutions, and the general public to be concerned.More malware is being launched than ever before.Cybersecurity is now a global priority as cybercrime The associate editor coordinating the review of this manuscript and approving it for publication was Vicente Alarcon-Aquino .and digital threats have rapidly increased in frequency and complexity.Robust Intrusion Detection Systems (IDS) that can handle big data are essential in today's cybersecurity landscape to ensure the accurate and efficient detection of security threats in large and complex networks.IDS protect computer networks from malicious attacks.Traditional ID systems faced limitations in their detection accuracy and efficiency.Network-based Intrusion Detection Systems (NIDS) are resource intensive.Therefore, an organization must plan for the additional hardware to deploy and smoothly run in the network.The primary reason for being resource-intensive and requiring additional hardware is to model complex, time-intensive data models [1].
An IDS is only as good as its signature library.If it is not updated frequently, it will not register the latest attacks and cannot raise an alert [2].Network Security Engineers who monitor the network traffic frequently update the classifier model of an IDS.When new threats emerge, they update the classifier model of the IDS by incorporating new rules, algorithms, or machine learning techniques to enhance its detection capabilities.These models which are trained on massive network-based datasets, are generally resource intensive, and have time and space complexity issues [1].Therefore, optimal feature selection to reduce the dimensionality of the datasets is of prime importance for any IDS to be able to detect and thwart threats in real time.
This study proposes integrating metaheuristic algorithms into an Intrusion Detection System (IDS), potentially improving its performance and accuracy in detecting different types of attacks.Metaheuristic algorithms are optimization techniques that can search for the best solution in a large and complex search space.These algorithms are search-based optimization techniques inspired by natural processes such as evolution, swarm behavior, and genetics [3].Reference [4] have proposed the use of grey wolf and dipper throat optimization for feature selection for IDS.Their results show an increase in classification accuracy between the different types of attacks, which would be beneficial for IoT systems.The authors of [5] have proposed the use of statistical measures such as Chi-squared test and Pearson correlation coefficient in tandem with a modified Genetic algorithm for feature selection for the creation of the IDS.They have achieved a high accuracy with minimum features selected for the IDS creation using their algorithm.On the same lines as [5], the authors of [6] have proposed the usage of a hybrid metaheuristic algorithm which uses artificial bee colony along with dragon fly algorithm for feature selection for the creation of the IDS.They have also obtained considerable results in classifying the attack and non-attack packets.The Tabu search metaheuristic algorithm for feature selection along with Random forest for classification has been proposed by the authors of [7].They claim to have reduced the false positive rate considerably by their approach.Further the authors of [8] have proposed a novel metaheuristic algorithm termed Operational Crow Search algorithm for dimensionality reduction of the feature space and have used Recurrent Neural Networks (RNN) for attack classification.
Our paper proposes an optimized approach for detecting malicious packets by integrating metaheuristic algorithms into an Intrusion detection system.The proposed algorithm aims to improve accuracy and precision while reducing space and time complexity by integrating metaheuristic algorithms with existing machine learning classifier techniques.The experimental results demonstrate that this hybrid approach outperforms existing classifiers, making it a promising solution for IDS optimization.

A. AUTHORS' CONTRIBUTIONS
• Firstly, our research presents various combinations of metaheuristic and ML algorithms for feature selection from a computer network traffic dataset, for optimal detection of intruders.
• Secondly, our research presents different ML classifiers in tandem with the feature selection algorithms for classification of intruder data.Several factors such as mean feature length, mean feature selection time etc. have been extensively explored and presented.
• Lastly our research presents three use cases of combinations of algorithms based on test score, F1 score, recall and precision which could be used by three types of organizations based on their needs.
The remainder of the paper is organized as follows.Section II describes related work.Section III introduces the three metaheuristic algorithms used in this study, Genetic Algorithm, Particle Swarm Optimization, and Grey Wolf Optimization Algorithm.There will be a brief discussion on the working of these algorithms.Section IV presents an improved intrusion detection method based on the selection of the optimal feature subset and feature weighting.Section V verifies the effectiveness of the proposed algorithms by comparing the experimental results with other methods of intrusion detection, and Section VI presents conclusions.

II. RELATED WORK
Researchers in [9] worked towards finding the best relevant selected features to be used as essential features in a new IDS dataset using the six feature selection methods, namely, Information Gain (IG), Gain Ratio (GR), Symmetrical Un-certainty (SU), Relief-F (R-F), One-R (OR) and Chi-Square (CS).
In 2016, a study [10] highlighted the importance of feature selection in intrusion detection systems (IDS) to improve accuracy and performance.The study proposes a recursive feature elimination mechanism and a decision tree-based classifier to identify and eliminate irrelevant parts.Applying this approach to the NSL-KDD dataset results in significant accuracy improvements.The NSL-KDD dataset is a benchmark for intrusion detection systems.These findings emphasize the value of feature selection in designing effective IDS.
An adaptive ensemble learning model named the Multi-Tree algorithm is proposed [11], focusing on the NSL-KDD dataset.The MultiTree algorithm adjusts the training data proportion and constructs multiple decision trees.A selection of base classifiers such as decision tree, random forest, kNN, and DNN is employed to enhance the overall detection effectiveness.An ensemble adaptive voting algorithm is also designed to improve detection accuracy further.It is important to note that data analysis reveals the critical role of data feature quality in determining detection effectiveness.The identified limitation of the study conducted in this paper pertains to the training and modeling process on noisy data, which needs a comprehensive feature selection approach.This deficiency has been specifically addressed in our research.
The work on machine learning and metaheuristic algorithms for anomaly-based intrusion detection in IoT-based health-care applications by [12] employs algorithms like Particle Swarm Optimization (PSO), Genetic Algorithm (GA), and Differential Evolution (DE) for feature selection and uses k-Nearest Neighbour (kNN) and Decision Tree (DT) for classification.The proposed hybrid approach combines these techniques to improve performance.The paper also presents an IoT-based healthcare architecture using the best-performing algorithm to detect and prevent malicious traffic.Reference [13] recommend a novel feature selection method using GA to determine the optimal feature subsets from the NSL-KDD dataset.The results of the proposed work were then compared with the existing feature selection methods to verify improved performance.
Reference [14] introduce a novel approach for network intrusion detection using the Horse herd optimization algorithm (HOA) and Quantum-inspired optimization.The proposed algorithm, MQBHOA, leverages horses' behavior in a herd to select effective features and enhance social behaviors for intrusion detection.The K-Nearest Neighbor (KNN) classifier is employed for classification.The performance of MQBHOA is evaluated on NSL-KDD and CSE-CIC-IDS2018 datasets.The results demonstrate that MQBHOA outperforms other metaheuristic algorithms, achieving higher feature selection and classification accuracy success rates.
The authors of [15] conduct a comprehensive investigation on the impact of feature selection on intrusion detection systems' performance.They employ the Random Forest (RF) algorithm to select pertinent attributes, aiming to enhance the effectiveness of IDSs.The study includes a comparative analysis involving diverse classifiers, including k-NN, DT, Support Vector Machine (SVM), Logistic Regression (LR), and Naïve Bayes (NB) classifiers, for the NSL-KDD dataset.The findings demonstrate notable improvements in detection rate, accuracy, and false alarm reduction compared to existing state-of-the-art classifiers.
Reference [16] propose a high-performance classification algorithm, SEKS, and SEIDS, for improving attack detection in an IDS.Their approach combines clustering, classification, and metaheuristic algorithms to enhance accuracy and detect unfamiliar attacks.The research shows that their method outperforms previous classification methods in accuracy.Table 1 outlines the summary of the reviewed literature.
Additionally, research has been conducted on optimized feature selection for IDS.However, there is a need for more in-depth exploration and optimization of IDS using metaheuristic algorithms, considering various combinations derived from different metaheuristic algorithms and machine learning classifiers.
Our article addresses this gap by emphasizing the development and comparison of different machine learning-assisted metaheuristic algorithms.The primary objective is to comprehensively analyze these algorithms and evaluate their performance in different scenarios.The aim is to provide a set of well-suited algorithms for various use cases and practical requirements.The study aims to bridge the gap in IDS optimization by delivering a comprehensive range of algorithm combinations capable of meeting diverse use case requirements and achieving desirable performance outcomes.

III. MACHINE LEARNING-ASSISTED METAHEURISTICS A. OPTIMIZATION ALGORITHMS AND METAHEURISTICS
Optimization algorithms use mathematical procedures to achieve the best possible solution within constraints.These algorithms iteratively modify the parameters of a system or function to minimize or maximize a specific objective function by exploring an ample solution space to find the optimal solution according to predefined criteria.Optimization techniques are categorized as Deterministic or Stochastic per their behavior, Unconstrained vs. Constrained subject to some constraints, and Linear vs. Nonlinear, based on the objective function [13].Optimization techniques are also extended to work on Local vs. Global solutions and First-Order vs. Second-Order solutions based on the derivative of the objective function.
Metaheuristic algorithms are characterized by their stochastic and iterative nature, utilizing randomness to explore the solution space [14].One of the key advantages of metaheuristics is their ability to quickly find satisfactory solutions for complex problems, even when traditional optimization algorithms fail to do so.Some well-known examples of metaheuristic algorithms include Genetic Algorithm, Simulated Annealing, Particle Swarm Optimization, Grey Wolf Optimization (GWO), and Ant Colony Optimization (ACO).Our study explores three prominent metaheuristic algorithms: Genetic Algorithm, Particle Swarm Optimization, and Grey Wolf Optimization.These algorithms have been selected based on their established effectiveness in addressing the study's research objectives.

B. METAHEURISTICS-BASED FEATURE SELECTION
The technique used in our research proposes machine learning-assisted metaheuristics for carrying out feature selection.Exhaustive feature selection can be effective for small datasets with few features.It becomes computationally infeasible for high-dimensional datasets.Heuristic methods and metaheuristic algorithms can be used to reduce the search space and improve the efficiency of feature selection.The approach chosen depends on the problem's specific requirements and the dataset's characteristics.When machine learning is used in the fitness function of metaheuristic algorithms, it becomes ''machine learning-assisted metaheuristics''.In this approach, the fitness function of the metaheuristic algorithm is enhanced with a machine-learning model that can estimate the fitness value of a candidate solution.The machine learning model is trained on a set of labeled data, which contains the fitness values of previously evaluated solutions.The machine learning-assisted metaheuristics approach can improve the optimization performance of the algorithm by reducing the number of function evaluations required to find the optimal solution.This technique is depicted in Figure 1.
The technique was achieved using the machine learn-ing model to predict the fitness value of candidate solutions, eliminating the need to evaluate all solutions in the search space.The machine learning model can also capture the underlying patterns and relationships in the search space, leading to more efficient and effective optimization.

C. GENETIC ALGORITHM, PARTICLE SWARM OPTIMIZATION, AND GREY WOLF OPTIMIZATION FOR FEATURE SELECTION
Genetic Algorithm (GA) is a metaheuristic optimization algorithm inspired by natural selection and evolution.GA mimics the process of natural selection by iteratively evolving a population of candidate solutions to find the optimal solution [18].
In the context of this study, we modify the GA for feature selection.The population array represents a group of potential solutions, where each solution is represented as a genomic sequence.The length of the genomic sequence corresponds to the number of features in the dataset.The population array is initially randomly generated, with each gene (element) being a number between 0 and 1.In this case, the fitness function is modified to evaluate the accuracy obtained by a particular organism in the population array.The genomic sequence of an organism represents the presence or absence of specific features from the feature set.Each gene in the genomic sequence corresponds to a particular feature, indicating whether it is selected.The fitness function determines how well an organism solves the problem by evaluating the accuracy using this genomic sequence.The GA works through successive generations, aiming to improve accuracy over time.The algorithm applies genetic operators, such as selection, crossover, and mutation, to simulate natural selection and evolution principles.The modified GA utilized in our research for feature selection is shown in Figure 2.
Particle Swarm Optimization (PSO) is a population-based optimization algorithm inspired by the social behavior of bird flocking or fish schooling.PSO is a heuristic search algorithm that aims to find the optimal solution in a search space by iteratively updating a group of particles representing potential solutions [19].The modified PSO-based feature selection algorithm begins by randomly initializing a population of particles within the search space.
Each particle is represented by a 2D vector called ''population'' where the number of rows corresponds to the number of particles, and the number of columns represents the number of features in the dataset.The population matrix is initialized with random numbers.A 2D vector called ''velocity'' is also created with the exact dimensions of the population matrix.It is initialized with zeros.Each particle in the population has an associated position and velocity vector.The algorithm optimizes the fit-ness scores throughout the iterations by adjusting the particle positions and velocities.Each particle updates its velocity and position vectors based on these comparisons to move towards better solutions in the search space.The velocity update is influenced by three factors: its previous velocity, personal best position, and global best position.The particle adjusts its velocity vector to balance exploration (following the global best) and exploitation (following its personal best).The velocity update equation in PSO is typically defined as follows: The update rule for the velocity of a particle in the PSO algorithm is given in equation 1 below: where v(t + 1)is the updated velocity of the particle at time (t + 1).v(t) is the particle's current velocity at time t.w is the inertia weight, controlling the impact of the particle's previous velocity.c1 and c2 are the cognitive and social coefficients, determining the influence of the particle's best position (p best ) and the global best position (g best ) on the velocity update.r1 and r2 are random numbers in the range [0, 1], introducing stochasticity into the algorithm.x(t) is the current position of the particle at time t.After updating the velocity, the particle updates its position using the following  x(t The iterations continue until a termination criterion is met, which could be a maximum number of iterations, reach-ing a predefined fitness threshold, or other stopping condi-tions.Figure 3 showcases the usage of the PSO algorithm for feature selection. At the core of the GWO algorithm is a population of candidate solutions, represented as a pack of grey wolves.The algorithm iteratively updates the positions of the wolves to explore and search for the optimal solution within the given search space.The wolves' social hierarchy and hunting behavior guide this exploration [21].The Modified Grey Wolf Optimization (GWO) algorithm for feature selection involves a step-by-step procedure to identify the most relevant features in a given dataset.Figure 4 is the pseudocode explaining the flow of the Modified GWO algorithm.The algorithm begins by initializing a population of search agents, representing potential feature subsets.These search agents are randomly positioned within the boundaries of the search space.
Next, the fitness of each search agent is evaluated using a fitness function that measures the performance of the selected features.The fitness function typically considers metrics such as accuracy, error rate, or other performance measures specific to the problem domain.The three best search agents are identified based on their fitness values: Alpha, Beta, and Delta.These agents represent the leaders of the population and have the highest fitness values.Their positions serve as references for updating the positions of other search agents.The algorithm then proceeds to iterate for a specified number of times.In each iteration, the positions of the three leaders (Alpha, Beta, and Delta) are updated using a set of equations (3), (4) as mentioned below.These equations involve parameters such as A, C, and D, which control the movement of the leaders within the search space.Here,

Xi
is the position of the wolf.X 1 is the first best position of the best wolf.X 2 is the second best position of the best wolf.X 3 is the third best position of the best wolf.

A(i)
is a randomly generated number for the exploration around the current position of the alpha wolf D_alpha is the distance between the current wolf (Positions [i, j] and the alpha wolf.The updated positions of the leaders are used as references for updating the positions of the remaining search agents.The positions of the remaining search agents are also updated using similar equations, considering the values of A, C, and This ensures exploration and exploitation of the search space to find optimal feature subsets.Additionally, the values of parameters A, and C are updated throughout the iterations to control the search behavior of the algorithm.The iterations continue until the maximum number of iterations is reached.At the end of the algorithm, the position of the Alpha search agent, representing the best-selected features, is returned as the final outcome of the feature selection process.

IV. DATASET DESCRIPTION
This study included the NSL-KDD and KDDCUP.10% of the datasets for testing and evaluating the algorithms.

A. NSL-KDD
The NSL-KDD dataset is a network intrusion detection dataset that was created by the University of New Brunswick Canadian Institute for Cybersecurity in response to some of the inherent problems of the KDD'99 dataset.These problems include: • The presence of redundant records in the train set can bias the classifiers towards more frequent records.
• The presence of duplicate records in the test sets can bias the learners' performance towards methods with better detection rates on the frequent records.The NSL-KDD dataset addresses these problems by: • Removing redundant records from the train set and removing duplicate records from the test sets.Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
• Balancing the classes in the dataset by oversampling the minority classes.The NSL-KDD dataset contains 41 features, which are divided into four categories: • Basic features: These features provide basic information about the network traffic, such as the source and destination IP addresses, the protocol used, and the number of bytes transferred.
• Data subject features: These features provide information about the content of the network traffic, such as the number of packets, the number of connections, and the number of TCP flags set.
• State features: These features provide information about the state of the network, such as the number of connections in progress and the number of connections that have been closed.
• Timing features: These features provide information about the timing of the network traffic, such as the start and end times of the connection.The NSL-KDD dataset contains 148,515 records, which are divided into a train set of 125,972 records and a test set of 22,543 records.The dimension of the train set is (125,972, 43), and the dimension of the test set is (22,543,43).

B. KDDCUP.DATA_10% DATASET
The kddcup.data_10%dataset contains (494,021 43) records and the difference in the number of features between the NSL-KDD dataset and the kddcup.data_10%dataset is since the NSL-KDD dataset includes an additional feature called 'label'.The 'label' feature indicates whether the record represents normal or anomalous network traffic.The KDD cup.data_10% dataset is a good resource for researchers interested in network intrusion detection.The dataset is wellbalanced and contains various features that can be used to train classifiers.The NSL-KDD and KDD cup.data_10% datasets are considered to be benchmark datasets in the realm of Intrusion Detection Systems as researchers and practitioners widely use them to evaluate the performance of new intrusion detection techniques.

V. METHODOLOGY
This section of the article presents the overall methodology followed for the data cleaning, feature selection using metaheuristic algorithms and classification using ML classifiers.2) Column Labels: Column labels are assigned to the train and test DataFrames using the labels list.These labels represent the different attributes or features of the data.These labels' definitions are available in the documentation or publication associated with the NSL-KDD [22] and KDD cup 1999 dataset [23], [24].

C. FEATURE SELECTION USING METAHEURISTIC ALGORITHMS AND CLASSIFICATION USING ML
A suitable ML classifier, such as Random Forest, Support Vector Machine (SVM), or Neural Network, is selected based on the specific requirements.In this research study, the focus was on four primary Machine Learning classifiers: Gaussian Naive Bayes (GNB), Decision Tree (DT), Logistic Regression (LR), and Random Forest (RF).The chosen ML classifier is trained using the training dataset (Xtrain, Ytrain) within the fitness function.Figure 4 gives one of the fitness functions used for the GA algorithm.Here, the features of the GA algorithm, the X and y variables and the ML classifier are passed as arguments to the fitness function.The classifier fits the X features to the y predictions.Finally, the accuracy score is calculated by comparting the actual data with the predictions made by the classifier.
In this manner, various fitness functions based on the metaheuristic algorithm and the ML classifier are generated and tested.The feature set is generated ran-domly following the working principles of the metaheuristic algorithm.The overall methodology is presented in Figure 5.The classifier's performance on the validation or test dataset (Xtest, Ytest) is tested to obtain the fitness score.The study's objective is to maximize the accuracy of the fitness score.Once the maximum number of iterations or generations is reached, the feature set with the highest fitness score (global maximum) is selected as the optimal solution.After train-ing with the selected features, the model uses four primary classifiers: Gaussian Naive Bayes, Decision Trees, Logistic Regression, and Random Forest classifier to classify the data as malicious or non-malicious.

VI. EXPERIMENTAL SET-UP, RESULTS, AND ANALYSIS
The experimentation has been conducted using Python with the Scikit-learn library on a Windows 11 system.The test computer has a processor which is Intel(R) Xeon(R) E-2124 CPU @ 3.30GHz 3.31 GHz.The installed RAM is of 32 GB size (31.9GB usable), and it uses a 64-bit operating system, x64-based processor.
Due to the inherent stochastic nature of the developed algorithms, they were executed multiple times to obtain a consensus by aggregating the results.Stochastic algorithms are valuable in solving complex problems that deterministic techniques may struggle with.These algorithms leverage randomization to explore a vast search space and discover high-accuracy solutions that deterministic methods might overlook.By incorporating probabilistic methods, stochastic algorithms identify accurate solutions and potentially find optimal solutions [25].This study bridges the gap in IDS optimization by delivering a comprehensive set of algorithm combinations.These algorithmic combinations can address diverse use case requirements and achieve desired performance outcomes.The primary objective is to thoroughly analyze these algorithms and assess their performance in different scenarios.Consequently, this research offers a selection of well-suited algorithms for various use cases and practical requirements.
For instance, organizations with substantial computational resources may prioritize algorithms that demonstrate high performance, regardless of their time and space complexity.On the other hand, organizations with limited computational capabilities and resources may opt for algorithms that exhibit lower asymptotic complexities during the modeling phase while still achieving satisfactory performance levels.Our approach enables the selection of algorithms based on specific computational constraints and operational requirements, facilitating the practical implementation of IDS solutions.

A. ALGORITHM NOMENCLATURE
The algorithms developed in this study, combining machine learning classifiers and metaheuristic algorithms, are assigned specific names for ease of identification.The nomenclature used for these algorithms can be found in Table 3.
Each algorithm name consists of three distinct parts.The first part represents the machine learning classifier that classifies network flows as malicious or benign.The second part, separated by a hyphen symbol, denotes the employed feature selection technique.The first part of the second half signifies the name of the metaheuristic algorithm utilized, while the latter part represents the machine learning classifier incorporated in the fitness function of the metaheuristic algorithm.For instance, an algorithm named DT-GA_RF indicates the final classification achieved after feature selection uses a Decision Tree (DT).The feature selection process is conducted using a Genetic Algorithm (GA) in combination with Random Forest (RF), with Random Forest employed within the Genetic Algorithm's fitness function.

B. RESULTS AND DISCUSSION
All the 48 developed machine learning-assisted metaheuristic algorithms were meticulously evaluated against each other, and the results are summarized.Table 4 presents the top 10 algorithms with the highest test scores, arranged in descending order.
The confusion matrix scores used for computing the performance of the proposed algorithms are given in the equations ( 5), ( 6), ( 7), ( 8 where TP = True Positive TN = True Negative FP = False Positive FN = False Negative An important observation derived from the results is the exceptional performance of the Random Forest (RF) classifier.The Random Forest Classifier is renowned for its ability to deliver robust performance even in the presence of noisy data.Notably, all the top 10 algorithms in the table employ the Random Forest Classifier as their final classifier, underscoring its effectiveness in intrusion detection.
The algorithm that achieved the highest test score, with an accuracy of 99.5787%, is one that incorporates feature selection through Particle Swarm Optimization (PSO) with Random Forest in the fitness function.Following closely is the algorithm that utilizes Random Forest as the classifier, paired with Particle Swarm Optimization with Decision Tree (PS_DT) as the feature selection technique.The third highest-scoring algorithm employs Genetic Algorithm (GA) with Gaussian Naive Bayes (GNB) as its feature selection technique.These findings highlight the prowess of the Random Forest classifier and emphasize the successful utilization of various metaheuristic algorithms combined with feature selection techniques for achieving high accuracy in intrusion detection tasks.
The top 10 algorithms based on test scores may differ from the top 10 algorithms based on the F1 score.Test scores reflect the accuracy and generalization ability of the algorithm on new, unseen data.In contrast, F1 scores combine precision and recall to provide a balanced performance measure, particularly in imbalanced datasets.As a result, algorithms with high accuracy may not necessarily have the highest F1 scores, and vice versa.The ranking of algorithms will depend on the IDS dataset, problem domain, and specific evaluation criteria employed.
A high recall score's significance in Intrusion Detection Systems (IDS) is crucial for effectively identifying and detecting malicious activities.Recall, also known as sensitivity or true positive rate, measures the proportion of actual positive instances (intrusions) that the IDS correctly identifies [26].Table 5 presents the top 10 recall scores in descending order.Among these algorithms, RF-PSO_RF achieved the highest recall score, indicating its ability to detect a large proportion of actual intrusions accurately.This algorithm is followed by RF-PSO_DT and RF-GA_GNB, which also demonstrate strong performance in terms of recall.A high recall score is desirable in IDS because it helps minimize the risk of false negatives, where actual intrusions go undetected.By effectively capturing a significant number of true positive instances, IDS algorithms with high recall scores can enhance the overall security of a system by promptly identifying potential threats and enabling timely response measures.Therefore, algorithms with high recall scores, such as RF-PSO_RF, RF-PSO_DT, and RF-GA_GNB, play a vital role in ensuring the effectiveness and reliability of IDS in detecting and mitigating security breaches.In Figure 6, the comparison of algorithms against the test score and corresponding F1 score is presented.When choosing an Intrusion Detection System (IDS) algorithm, it is crucial to consider both the test score and the F1 score as they provide complementary information about the algorithm's performance.An algorithm with a high test score but a low F1 score may indicate many false negatives, meaning it fails to detect actual threats.On the other hand, an algorithm with a high F1 score but a low test score may generate many false positives, triggering alerts for benign traffic.Hence, selecting an IDS algorithm that balances a high test score and a high F1 score is vital.This balance helps minimize false negatives and positives, ensuring accurate detection rates.The plot in Figure 7 serves as a tool to compare the performance of different IDS algorithms.The stacked plot in Figure 7 visually represents the algorithms' performance in different IDS classes across multiple metrics.The ''all-rounders'' or best-performing algorithms can be identified by observing the bars with high stacks across all performance categories, including Test Score, F1-Score, Recall-Score, and Precision-Score.These algorithms demonstrate consistent and balanced performance across various aspects of IDS evaluation.Choosing algorithms that excel in all performance metrics helps ensure a comprehensive and robust IDS solution.By considering the overall performance rather than emphasizing a single metric, the selected algorithms are more likely to provide accurate detection, minimize false negatives and false positives, and maintain high levels of precision and recall.
By examining the stack bar chart in 7, we can identify several algorithms demonstrating promising performance across multiple performance metrics.These algorithms include RF-GWO_DT, RF-GA_GNB, RF-GA_LR, DT-GWO_DT, LR-GWO_LR and so on.These algorithms exhibit high stacks in Test Score, F1-Score, Recall-Score, and Precision-Score, indicating strong performance across different metrics.
Feature selection is crucial in building an effective machine learning-based IDS. Figure 8 shows the subplots that are organized in a grid, with each row representing a different type of optimization algorithm (Genetic Algorithm, Particle Swarm Optimization, Grey Wolf Optimization) and each column representing a different machine learning model (Gaussian Naive Bayes, Logistic Regression, Decision Tree, Random Forest).The purpose of organizing the subplots in this way is to compare the performance of different IDS algorithms within each optimization and model category and across different categories.Each plot in the grid corresponds to a different feature selection technique.This indicates that the data has been preprocessed using different feature selection methods, and the resulting features are being used to train the IDS model.By visualizing the performance of different algorithms in each plot, the efficacy of the selected feature set can be evaluated.Feature selection techniques are used to identify the most relevant features (or attributes) that can provide maximum information gain for the model while reducing the dimensionality of the input data.It is important to note that the choice of feature selection technique can significantly impact the performance of an IDS model.
The study analyzed 48 unique feature selection techniques: machine learning-assisted metaheuristic algorithms.These algorithms incorporated a machine learning model within their fitness function.The study categorized these algorithms based on the feature selection technique employed, resulting in 12 distinct techniques, this includes GA_GNB, GA_LR, GA_DT, GA_RF, PSO_GNB, PSO_LR, PSO_DT, PSO_RF, GWO_GNB, GWO_LR, GWO_DT and GWO_RF.Following the feature selection process, the study focused on four prominent machine learning models (GNB, RF, LR, and DT) for classification purposes.All results obtained under these classifiers were normalized, and each group's mean score was calculated.For instance, the feature selection technique GWO_LR was utilized by RF-GWO_LR, LR-GWO_LR, DT-GWO_LR, and GNB-GWO_LR.The results of these four algorithms were normalized and averaged, ultimately grouped under GWO_LR.This same process was applied to the remaining 11 feature selection techniques.To evaluate the performance of the algorithms, four-vector arrays were maintained to capture the mean test score, mean F1 score, mean precision score, and mean recall score.Each array contained 12 scores corresponding to the 12 feature selection techniques.To provide a clear understanding of the overall performance of the feature selection techniques, a 3D bar plot is generated.This visualization enables comparing the 12 developed algorithms across the four performance metrics.Figure 9 depicts the 3D bar plot.This analysis makes it possible to determine the superior machine learning-assisted metaheuristic technique based on the overall performance observed in the bar plot.The findings from the 3D bar plot in Figure 9 are further examined and represented in a 2D version combining a bar plot and multiple line plots in Figure 10.This visualization aims to provide a more comprehensive understanding of the performance of different feature selection techniques.Upon analyzing the combined plot, a key inference is drawn regarding the superiority of two specific feature selection techniques over the others.The algorithms GWO_LR and GWO_GNB consistently demonstrate higher scores across all performance metrics.These techniques exhibit strong performance in terms of the mean test score, mean F1 score, mean precision score, and mean recall score.
While some other algorithms display high scores in the mean test score metric, they exhibit relatively poorer performance in the remaining evaluation metrics.This observation highlights the importance of considering multiple performance metrics to assess the overall effectiveness of a feature selection technique.The combined 2D plot allows for a more nuanced understanding of the comparative performance of the selection techniques.It enables researchers and practitioners to identify the algorithms that excel across multiple evaluation metrics, such as GWO_LR and GWO_GNB, thereby making informed decisions regarding adopting the most technique for their specific requirements.Dimensionality reduction is significant while modeling an IDS using ML because it can significantly reduce the space and time complexity of the model [27].High-dimensional data can be very computationally expensive to process.They may lead to overfitting, where the model is too complex and learns the noise in the data instead of the underlying patterns.Feature selection is a technique used for dimensionality reduction, which aims to identify the most relevant features for the model while removing redundant or irrelevant features [22].By reducing the number of features, feature selection can improve the model's accuracy, reduce overfitting, and speed up the training and prediction time.
Striking a balance between lowering the data size (Space complexity) and performance is crucial when modelling an IDS using machine learning.Reducing data size can be beneficial as it reduces the space complexity of the model and leads to faster and more efficient computations.
However, reducing the data size too much can lead to a loss of critical information and patterns, which can negatively impact the performance of the IDS.On the other hand, performance is vital as it determines the effectiveness of the IDS in detecting and preventing intrusions.A highly accurate model will detect attacks and minimize false positives and negatives.However, achieving high performance often requires more data and complex algorithms, increasing space complexity and slow computation.Therefore, a balance between space complexity and performance must be struck to ensure that the IDS is efficient and effective.This balance can be achieved through feature selection, dimensionality reduction, and model optimization.These techniques aim to reduce the data size while retaining critical information and optimizing the model's performance.
Figure 11 provides valuable information for modelling an Intrusion Detection System (IDS) using machine learning by considering the number of features to select before training the model.Specifically, the graph shows each algorithm group's mean length of selected features and the mean test score.
The mean length of selected features is an essential IDS feature selection metric.The number of features selected can impact the model's performance, as too many features can result in overfitting, and too less features can result in underfitting.Therefore, having an idea of the mean length of selected features can help select the ideal set of features for the model.The mean test score, on the other hand, provides information about the model's performance.
Achieving a balance between test scores and the number of selected features is crucial.An exceptional algorithm would select minimal features while achieving high scores.Figure 11 presents a bar plot of the algorithms, showcasing their corresponding lengths of selected features.Additionally, a red line plot indicates the corresponding test scores.This approach can be extended to other performance metrics, such as F1-score, recall, and precision.Among the algorithms, GWO_LR and GWO_GNB stand out for their remarkable results.Despite significantly reducing the NSL-KDD dataset, GWO_LR maintains a mean test score above 94%, while GWO_GNB achieves a score of 92.5%.GWO_LR selects fewer than 20 features, whereas GWO_GNB chooses approximately 15 features from the original set of 41 features in the training dataset.
This substantial reduction in the training data has notable implications, including reduced space and time complexity.The scatter plot in Figure 12 shows the relationship between the selected features' length and the test score for various IDS models, where the feature selection was made through different metaheuristic algorithms.The plot shows that some algorithms have achieved high test scores with very few features selected.This suggests that these algorithms could select the most relevant features for the model, resulting in a more efficient and accurate model.
Moreover, the color coding of the scatter plot indicates that different metaheuristic algorithms have resulted in different lengths of features selected for each IDS model.This suggests that the choice of metaheuristic algorithm used for feature selection can significantly impact the resulting feature set and the overall performance of the IDS model.Therefore, selecting an appropriate metaheuristic algorithm for feature is essential in developing an effective IDS model.By analyzing the scatter plot, we can observe that there are specific algorithm groups, such as RF-GWO_RF, RF-GWO_DT, DT-GWO_RF, and GNB-GWO_DT, which exhibit both high test scores and a consistent feature selection length below 10.
The exact performance metrics of the aforementioned algorithms are provided in Table 6 for further evaluation and reference.The algorithms are grouped by distinct colors and symbols, making them easily identifiable.These algorithms demonstrate excellent performance with high test scores and consistently high metrics such as F1-Score, Recall, and Precision Scores.Additionally, they exhibit the ability to maintain a relatively low number of features selected, which can be advantageous in terms of the efficiency and interpretability of the IDS.Overall, these results indicate the potential effectiveness of these algorithms for intrusion detection tasks.
One of the key challenges in building an effective Intrusion Detection System (IDS) is accurately identifying the subset of features most relevant for detecting malicious activities while maintaining computational efficiency.The feature selection process is crucial in IDS modeling as it directly impacts performance.Including irrelevant or redundant features can lead to overfitting and reduced generalization capability, while selecting less features may result in a lack of discriminatory power to detect malicious activities.Additionally, the computational cost of feature selection becomes a significant concern, particularly for large datasets, as time-consuming algorithms may need to be more practical for real-time IDS applications.Thus, finding a balance between the number of features and the computational cost of feature selection is essential when building an effective IDS.
The plot in Figure 13 provides valuable insights into the relationship between feature selection time and the test score for different feature selection algorithms in the context of IDS.It is a valuable tool for selecting an appropriate feature selection technique based on a given dataset.The  plot demonstrates a trade-off between feature selection time and test score, highlighting that different algorithms exhibit varying levels of computational complexity and effectiveness in selecting relevant features.Some algorithms may require more time for feature selection but yield higher test scores, indicating better performance, while others may have shorter selection times but result in lower test scores.Therefore, when designing an IDS, careful consideration of both feature selection time and test score is essential, as the computational cost significantly impacts the overall system performance.
Selecting an algorithm that balances these factors is crucial to optimizing the IDS performance, reducing false alarms, and improving detection accuracy.By thoroughly evaluating the feature selection time and test scores of different algorithms, IDS developers can make informed decisions to enhance the effectiveness and efficiency of their systems, ultimately contributing to the improved security of computer networks and systems.The algorithms used in the study were categorized into 12 groups based on the feature selection techniques they employed.The average test score and feature selection time were computed for each group, and the results are presented in Figure 13.Finding the most optimized algorithm involves balancing, minimizing time complexity, and maximizing test scores.Some algorithms achieve high  accuracy but suffer from excessive time complexity, while others perform reasonably well with reasonable accuracy.
Figure 13 illustrates the relationship between feature selection time, test score, and the algorithms employed.The blue bars represent the feature selection time, while the yellow line plot represents the mean test score.The objective is to achieve a low feature selection time while maintaining a high test score.Among the algorithms, GWO_GNB stands out with the lowest feature selection time of 34.18 seconds and a respectable test score of 93%.On the other hand, GWO_LR demonstrates an impressive test score of 98%, but it is accompanied by one of the most extended feature selection times, clocking in at 5529.18 seconds.
Among the algorithms analyzed, PSO_LR demonstrates a decent test score of 87% with a relatively low feature selection time of 1427.73 seconds.On the other hand, PSO_GNB achieves a test score of 85% with an even lower feature selection time of 59.13 seconds.These instances highlight the variations in performance among the different algorithms.
Furthermore, our research explores all possible combinations of weights/ tradeoffs for each criterion (Test Score, F1 Score, Recall, Precision, Length-Selected-Features, and Run-Time-FS) and then calculates the weighted score for each algorithm using the current combination of weights.It then ranks the algorithms based on their weighted scores and prints the combination of weights and the ranking of algorithms for each combination of weights.
The weights allotted for different evaluation criteria to calculate each algorithm's overall score as depicted in Table 6.This can be useful for comparing and selecting the best algorithm based on multiple criteria rather than just one.The weights determine the relative importance of each criterion  in the overall score calculation.For example, we believe that test scores and length-selected-features are essential criteria.In that case, we can assign them higher weights than other criteria like recall and time for Feature Selection.
During the computation of the final score, a weighted sum is calculated by multiplying each weight with its corresponding value and summing up the results.
This process allows for combining different weighted factors to determine the overall score.For example, in the equation shown below is used to calculate the weighted score: In Equation ( 9) each weight (obtained from the weight combinations) is multiplied by its respective normalized value (e.g., normalized test score, normalized f1 harmonic score).df stands for 'data frame' chosen for the respective  The subtraction assigns a higher importance or preference to algorithms with a lower value for 'Normalized-Length-Selected-Features' or 'Normalized-Run-Time-FS'.By subtracting these products, we effectively penalize algorithms with a higher value for these features.
Algorithms with smaller values for 'Normalized-Length-Selected-Features' or 'Normalized-Run-Time-FS' will have a higher weighted score than those with larger values, promoting the selection of algorithms that balance performance and efficiency.This subtraction term helps create a comprehensive evaluation metric that considers the positive aspects of various criteria and the potential drawbacks associated with longer feature selection time or a more significant number of selected features.Three combinations of weights have been presented below.weights.Test Score measures a model's accuracy in predicting unseen data, so unsurprisingly, it has been given high priority.Meanwhile, Length-Selected-Features refers to the number of features selected by the feature selection algorithm, and placing high weight on this suggests a desire for a high-performing and efficient model.
Additionally, Recall and Precision are given lower weights in this combination.Recall measures a model's ability to correctly identify positive instances (i.e.true positives), while Precision measures the proportion of true positives among all positive predictions.Giving these lower weights suggests that the user may value a balanced model that does not prioritize sensitivity or specificity over the other.Finally, Run-Time-FS has the lowest weight of 0.1, which suggests that model efficiency is still important but has yet to be the highest priority.This weight value indicates that the user wants to balance model performance with the resources required to train and run the model.

2) COMBINATION 2
This combination of ([0.4,0.4, 0.1, 0.1] and [−0.3, −0.2]]) as seen in the second row of Table 5 places equal weight on Test Score and F1 Score, which indicates a preference for a model with good overall performance.F1 Score measures a model's balance between Precision and Recall, which suggests that the user values a model well at identifying true positives and avoiding false positives.In this combination, Length-Selected-Features is given a lower weight, which may indicate a desire to keep the feature space relatively small to improve efficiency and avoid overfitting.Additionally, Run-Time-FS weighs 0.2, higher than in Combination 1.This suggests that while model efficiency is still essential, achieving good overall performance is a higher priority.

3) COMBINATION 3
This combination of ([0.3, 0.2, 0.2, 0.2] and [−0.6, −0.1]]) as seen in third row of Table 5 places the highest weight on Length-Selected-Features, indicating a preference for a simple and efficient model.This suggests that the user values a model that is easy to understand and use, even if it sacrifices some performance.Test Score and F1 Score are given lower weights, which indicates a willingness to sacrifice a small amount of accuracy in favour of simplicity.Recall and Precision are given equal weight in this combination, suggesting a desire for a balanced model that performs well in sensitivity and specificity.Finally, Run-Time-FS has the lowest weight of 0.1, indicating that model efficiency is still important but not the highest priority.These three model combinations suggest that the user is considering trade-offs between model performance, efficiency, and simplicity and is trying to find the best balance for their specific use case.The plot in Figure 14 shows the results of applying different weight combinations to performance metrics for different algorithms.Each line represents a combination of weights, and the x-axis shows the algorithms ranked in order of their weighted score.The y-axis shows the weighted score, a composite score calculated based on the specified weights for each performance metric.
The significance of this plot is to provide a way to eval uate and compare different algorithms based on multiple criteria simultaneously.By assigning weights to different performance metrics, we can prioritize certain aspects of the model's performance over others, and the weighted score reflects this overall evaluation.This plot can help us decide which algorithm to choose based on our priorities and preferences.
By examining the plot, we can assess the performance of the algorithms for the three use cases indicated by the combinations.RF-GWO_DT consistently achieves the highest scores across all three combinations.For Combination 1, RF-GWO_DT obtains a score of 97%, while for Combination 2 and Combination 3, it achieves 96% and 85%, respectively.This information allows us to compare the performance of different algorithms under different weight configurations.RF-GWO_DT stands out as a high-performing algorithm across all three use cases, demonstrating its consistency and effectiveness in various scenarios.

VII. CONCLUSION AND FUTURE WORK
This research study explored various combinations of machine learning-assisted metaheuristics for modeling an intrusion detection system (IDS).By leveraging different algorithms and their combinations, the study aimed to provide organizations with various options to meet their specific requirements.
Large organizations with ample resources and computational capabilities have the flexibility to choose the highest-performing algorithm regardless of its time and space complexity.These organizations can invest in high-end computation facilities and leverage complex algorithms that offer superior performance in desired metrics.On the other hand, the study recognizes that not all organizations have the same resources and capital, particularly those just starting up.These organizations may need more computational resources and funding.Therefore, the study recommends that such organizations consider algorithms with lower asymptotic complexity, which are computationally efficient while still achieving satisfactory results regarding the desired metrics.The research study provides a diverse set of algorithms, offering multiple options that cater to the specific requirements of different organizations.
The algorithm that achieved the highest test score, with an accuracy of 99.5787%, is one that incorporates feature selection through Particle Swarm Optimization (PSO) with Random Forest in the fitness function.RF-PSO_RF achieved the highest recall score, indicating its ability to detect a large proportion of actual intrusions accurately.The algorithms RF-GWO_DT, RF-GA_GNB, RF-GA_LR, DT-GWO_DT, LR-GWO_LR exhibit high stacks in Test Score, F1-Score, Recall-Score, and Precision-Score, indicating strong performance across different metrics.The algorithms RF-GWO_RF, RF-GWO_DT, DT-GWO_RF, and GNB-GWO_DT exhibit high test scores and a consistent feature selection length below 10.Regarding time complexity, GWO_GNB stands out with the lowest feature selection time of 34.18 seconds and a respectable test score of 93%.On the other hand, GWO_LR demonstrates an impressive test score of 98%, but it is accompanied by one of the most extended feature selection times, clocking in at 5529.18 seconds.
Furthermore, our research explores all possible combinations of weights/ tradeoffs for each criterion (Test Score, F1 Score, Recall, Precision, Length-Selected-Features, and Run-Time-FS) and then calculates the weighted score for each algorithm using the current combination of weights.By examining the plot, we can assess the performance of the algorithms for the three use cases indicated by the combinations.RF-GWO_DT consistently achieves the highest scores across all three combinations.
For Combination 1, RF-GWO_DT obtains a score of 97%, while for Combination 2 and Combination 3, it achieves 96% and 85%, respectively.This information allows us to compare the performance of different algorithms under different weight configurations.RF-GWO_DT stands out as a high-performing algorithm across all three use cases.
While the current research has addressed many optimization areas, several potential avenues for future work could further enhance the IDS systems.These areas of future scope aim to explore additional dimensions and challenges to continue refining and expanding the capabilities of IDS systems.The following points highlight some of these areas: Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
efficient search-related metaheuristic algorithms.These algorithms would focus on finding optimal hyperparameters in machine learning models.By improving the search process, the accuracy and computation time of the models could be significantly enhanced.This would contribute to more optimized IDS performance.
• Global Maxima for F1 Score in the Fitness Function: Another area of exploration is redefining the fitness function used in metaheuristic algorithms.Instead of solely optimizing for accuracy, the fitness function could be modified to optimize for the global maxima of the F1 score or other case-specific objectives.This would allow the IDS to prioritize detection and classifi-cation performance beyond simple accuracy measure-ments, resulting in more robust and reliable intrusion detection.
• Ensemble Learning Techniques: Investigating ensemble learning techniques could be another area of future work.Ensemble methods, such as bagging, boosting, or stacking, can combine multiple classifiers to improve overall prediction accuracy and robustness.Exploring the effectiveness of ensemble methods in the context of the IDS being studied could lead to even better detection and classification results.Further research can aid in developing intrusion detection systems by investigating these different aspects.This includes improving ensemble learning, real-time detection, scalability, explainability, and detecting adversarial attacks, among other things.These explorations can improve the IDS's abilities and usefulness in effectively identifying and preventing intrusions in demanding situations.

FIGURE 2 .
FIGURE 2. Flowchart for the modified genetic algorithm for feature selection.

FIGURE 3 .
FIGURE 3. Flowchart for modified PSO algorithm for feature selection.
A. DATA CLEANING AND PRE-PROCESSING 1) Loading the Data: The training and test data files are read.The loaded data is stored in the train and test DataFrames.

FIGURE 4 .
FIGURE 4. Sample fitness function of GA algorithm.

FIGURE 5 .
FIGURE 5. Machine learning assisted metaheuristic technique for feature selection and classification.

FIGURE 6 .
FIGURE 6. Algorithms compared against their test scores and f1 scores.

FIGURE 8 .
FIGURE 8. Feature selection techniques and test-scores.

FIGURE 11 .
FIGURE 11.Feature selection techniques vs the mean length of selected features and test score.

FIGURE 12 .
FIGURE 12. Test score vs the length of selected features.

FIGURE 13 .
FIGURE 13.Feature selection time and test score vs the algorithms.

FIGURE 14 .
FIGURE 14. Weighted score of algorithms across the three combinations.

TABLE 1 .
Summary of literature review.
''normal'' in the ''attack_type'' column are marked as ''False'' in the ''attack_check'' column, indicating normal network activities.This study identifies and classifies attacks, irrespective of specific attack types such as DOS, Probe, U2R, or R2L.The objective is to distinguish between normal network activities and any form of attack without delv-ing into the specific attack categories.By adopting this approach and considering only the binary classification of attacks versus normal activities, the study simplifies the task of detecting and identifying any attack without the need to differentiate between the various attack types.5)EncodingCategoricalVariables:Label encoding on the categorical variables in the DataFrame using Labe-lEncoder from the preprocessing module is performed.This ensures that categorical variables are represented as numerical values for model training.B.FEATURE AND TARGET SPLIT1) Feature split: The features (X) and the target variable (Y) are separated from the data frame.2) Splitting method: The feature and target data are split into training and testing sets using the train_test_split method from the Sklearn model selection module.The data is split into 'X_train', 'X_test', 'y_train', and 'y_test', with a test size of 0.30 (30% of the data is used for testing).Table1presents the details of the dataset used.

Table 5 ,
prioritizes Test Scores and Length-Selected-Features, assigning them the highest Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.

TABLE 6 .
Performance metrics for IDS algorithms.

TABLE 7 .
The three weight combinations.

TABLE 8 .
High performers in each combination.