An Advanced Intrusion Detection System for IIoT Based on GA and Tree Based Algorithms

The evolution of the Internet and cloud-based technologies have empowered several organizations with the capacity to implement large-scale Internet of Things (IoT)-based ecosystems, such as Industrial IoT (IIoT). The IoT and, by virtue, the IIoT, are vulnerable to new types of threats and intrusions because of the nature of their networks. So it is crucial to develop Intrusion Detection Systems (IDSs) that can provide the security, privacy, and integrity of IIoT networks. In this research, we propose an IDS for IIoT that was implemented using the Genetic Algorithm (GA) for feature selection, and the Random Forest (RF) model was employed in the GA fitness function. The models used for the intrusion detection processes include classifiers such as the RF, Linear Regression (LR), Naïve Bayes (NB), Decision Tree (DT), Extra-Trees (ET), and Extreme Gradient Boosting (XGB). The GA-RF generated 10 feature vectors for the binary classification scheme and 7 feature vectors for the multiclass classification procedure. The UNSW-NB15 is used to assess the effectiveness and the robustness of our proposed approach. The experimental outcomes demonstrated that for the binary modeling process, the GA-RF achieved a test accuracy (TAC) of 87.61% and an Area Under the Curve (AUC) of 0.98, using a feature vector that contained 16 features. These results were superior to existing IDS frameworks.


I. INTRODUCTION
In recent years, the Internet of Things (IoT) paradigm has shown massive adoption by different industries including the medical sector, vehicle manufacturers, home appliances manufacturers, etc. The acceptance of IoT technology has significantly changed the way we live [1]. The specific use of IoT in the modern industry gave birth to the Industrial IoT (IIoT) concept. Modern Industrial Internet of Things (I-IoT or IIoT) depicts using the regular IoT in different industrial ventures and organizations. IIoT contains countless actuators, sensors, control systems, communication and integration interfaces, advanced security systems, vehicular networks, home appliances networks, etc. All the nodes within the IIoT can connect to the Internet. Using IIoT in modern industries has greatly enhanced the capabilities of various sectors such as manufacturing plants, asset management systems, advanced logistics systems, etc. Moreover, the IIoT allows for several The associate editor coordinating the review of this manuscript and approving it for publication was Eyhab Al-Masri . applications, devices, and services to connect the physical space to a virtual one [2].
There exist several ways IIoT nodes connect to the Internet and this includes communication protocols such as the Transmission Control Protocol and the Internet Protocol (TCP/IP) using Message Queue Telemetry Transport (MQTT), Modbus TCP, Cellular, Long-Range Radio Wide Area Network (LoRaWAN), etc. [3], [4]. Moreover, most IIoT nodes can collect, process, and transmit data. These abilities make them susceptible to some privacy and security threats that have the potential to jeopardize the IIoT systems and the applications to which they belong [5]. One of the key attributes of IIoT nodes is that they are always active while performing the collection, processing, and transmission of data. Fig. 1 depicts all the layers that are present in the IIoT, namely, the perceptual layer, the network layer, the application layer, and the Cloud. These layers are based on the flow of data. Moreover, each layer is prone to various types of attacks and intrusions that could compromise the systems within the IIoT. Some common attacks and intrusions on the IIoT ecosystem include access control VOLUME 9, 2021 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ attacks, data corruption breaches, spoofing attacks, Denial of Service (DoS) attacks, Distributed DoS, Operating System (OS) attacks, jamming attacks, etc. To counter these malicious attacks and to guarantee that the active nature of IIoT nodes and the security of IIoT networks are maintained, a lot of organizations are implementing Intrusion Detection Systems (IDSs). Moreover, these IDSs can be configured at any layer in Fig. 1 [5]. An IDS plays a critical role in the IIoT by guaranteeing that the integrity, security, and privacy of data transmitted through its network are maintained. An IDS can prevent, detect, react and report any attacks or malicious activities that have the potential to cripple an IIoT network [6]. Traditional IDSs are broadly categorized as follows: signature-based, anomalybased, and hybrid-based. Signature-based IDSs are designed using existing (known) attack signatures that can be found in the IDS database. Anomaly-based IDS are implemented using abnormal patterns within a network. Hybrid-based IDSs combine signature and anomaly-based IDSs. Some drawbacks of traditional IDSs include a high false-positive rate and a low detection accuracy. Additionally, they cannot detect novel types of intrusions and are incapable of preventing events such as zero-day attacks. To improve on the performance of traditional IDSs, researchers have explored the use of Artificial Intelligence (AI) and more particularly, the application of Machine Learning (ML) based techniques for IDS [7], [8].
ML is a branch of Artificial Intelligence (AI) that empowers various systems with the ability and the capacity to learn from experience and to ameliorate their decision-making process without any explicit programming [9]. At the top level, ML approaches are categorized as supervised and unsupervised. At a granular level, ML algorithms are classified as follows: supervised, unsupervised, semi-supervised, and reinforcement. Supervised ML methods improve their decision-making process by learning from a labeled dataset (a dataset with data points that have a label) to perform future predictions. In contrast, unsupervised ML approaches are used when the learning task involves unlabelled data. Semi-supervised ML algorithms use both labeled and unlabeled data during the learning process. Reinforcement ML methods compute rewards or errors based on their interaction within a given environment [10].
In this research, we propose an IDS for IIoT that uses Tree-based supervised ML algorithms. ML-based IDSs are generally trained using the latest intrusion detection datasets. Nonetheless, the majority of the modern datasets are large, both on the feature space dimension as well as the number of network traces. A high number of features in a dataset has the potential to negatively impact the training process of ML algorithms. Often the performance of ML methods is reduced as the number of features increases. In other words, it is harder to perform the learning process as the number of attributes increases in a dataset [11]. Thus, it is crucial to perform a feature selection or extraction process to guarantee that the size of the attribute vector is reduced to an optimal number of required features [12].
There are three types of feature selection (FS) methods: wrapper-based FS, filter-based FS, and hybrid-based FS. In the instance of the filter-based FS method, the selection process relies on the nature of the data and it uses a variety of statistical methods to extract the optimal feature vector. The filter-based FS method is computationally cheap and efficient. In contrast, the wrapper-based FS approach employs a predictor in the selection process. This occurs by iteratively computing the predictor's performance over several subsets of features until the candidate optimal feature vector is found. The wrapper-based FS method is computationally expensive, but it is precise in comparison to other FS methods. The hybrid-based FS technique, sometimes called embedded-based FS, combines the filter-based and the wrapper-based FS methods [13]- [15]. In this research, we propose a wrapper-based FS method, based on the Genetic Algorithm (GA) [16] that uses the Random Forest (RF) ML algorithm [17] in its fitness function to generate optimal candidate feature vectors. Furthermore, to assess the performance of our proposed method, we use the UNSW-NB15 intrusion detection dataset. This dataset is widely adopted by the research community [18], [19]. The network traces present in the dataset were generated in a laboratory environment. But, they do mimic the real-world network traffic patterns, such as the ones generated by an IIoT network system [20]. Additionally, the UNSW-NB15 is a more complex dataset in comparison to the NSL-KDD or KDD Cup 99 datasets [20] and it includes a higher variety of network traffic patterns. Moreover, the UNSW-NB15 is a general-purpose dataset that paved the way to datasets such as the TON_IoT dataset [21].
The major goals and contributions of this paper are as follows: • Firstly, we propose a Genetic Algorithm (GA)-based feature selection algorithm. The fitness function used in the GA method used the Random Forest (RF) to generate the fitness scores.
• Secondly, for each solution (attribute vector), we implement Tree-based algorithms such as RF, the Decision Tree (DT), and the Extra Tree (ET) methods. Moreover, the generated attribute vectors can be applied by other researchers using their own classifiers.
• Lastly, we conduct a comparison between our proposed method with existing systems. The results demonstrate a noteworthy improvement in performance. The remainder of the paper is structured as follows. Section II presents an account of related work. Section III introduces the UNSW-NB15 dataset. Section IV presents the proposed IDS methodology. Section V outlines the experiments and provides discussions about the results. Section VI concludes this paper and provides future directions.

II. RELATED WORK
This section provides an account of related research works that were conducted in the domain of IDS using ML techniques. Moreover, this section serves as a survey of various IDS frameworks and solutions that were previously implemented for intrusion detection in IoT-based systems.
Liu et al. [22] implemented an IDS system for IoT using a Particle Swarm Optimization (PSO)-based technique for feature selection and the Support Vector Machine(SVM) ML algorithm for classification. The PSO method used in this research is based on the Light Gradient Boosting Machine (LightGBM). The authors used the UNSW-NB15 dataset to validate their model and they considered the accuracy and the False Alarm Rate (FAR) as the performance metrics. The experimental results demonstrated that the PSO-LightGBM achieved an overall accuracy of 86.68% and a high FAR of 10.62%. This research was based on the binary classification scheme. But, the authors could have also implemented the multiclass classification procedure to assess the full potential of their method. Moreover, the FAR obtained by the LightGBM is high.
Zhou et al. [23] implemented a Variational LSTM (VLSTM) IDS for Industrial Big Data systems. The VLSTM was implemented in conjunction with a feature selection and retention technique based on the reconstructed rendering of features. The authors used an Auto-Encoder Neural Network (AENN) to retrieve the low-dimensional attribute characteristics from high-dimensional datasets. To study their model, the researchers used the UNSW-NB15 dataset. During the evaluation phase, the following performance metrics were employed: the False Alarm Rate (FAR), the Area Under the Curve (AUC), the precision, the recall, and the F1-Score. The experimental results demonstrated that the VLSTM achieved an AUC of 0.895, a precision of 86%, a recall of 97.8%, and an F1-Score of 90.7%. Although these results were superior to some of the existing methods. The authors conceded that further experiments needed to be done to deal with the highly imbalanced nature of the UNSW-NB15.
In [24], the authors proposed an ML-based IDS using an adaptive principal component (APAC) for the feature selection process and an incremental extreme learning machine (IELM) algorithm for classification. In this research, the APAC is used to adaptively generate candidate attributes that are then fed to the IELM for the classification procedure. The authors considered the NSL-KDD and the UNSW-NB15 datasets to gauge the effectiveness of the presented framework. Moreover, the multiclass classification scheme was used for both datasets. The main performance metric that was utilized in this work was the accuracy achieved by a model on test data. In the case of the NSL-KDD dataset, the APAC-IELM achieved an accuracy of 81.22%. For the UNSW-NB15, the APAC-IELM obtained an accuracy of 70.51%. Although the authors claimed that the obtained results were superior to those obtained by the existing systems, they conceded that more research needed to be undertaken to adapt the APAC-IELM to industrial control systems (ICS).
In [25], the authors proposed a deep neural network (DNN)-based IDS. In this research, the aim was to develop a flexible and robust IDS that could easily detect novel forms of attacks. To assess the efficacy of the presented method, the following datasets were considered: KDD-Cup99, UNSW-NB15, NSL-KDD, Kyoto, WSN-DS, and CICIDS 2017. The experimental processes were executed over 1000 epochs for each dataset. Focusing on the UNSW-NB15, the experiments demonstrated that the DNN obtained an accuracy of 76.1%, a precision of 95.1%, a recall of 96.3%, and F1-Score of 79.7% for the binary modeling process. In contrast, the DNN obtained an accuracy of 65.1%, an F1-Score of 75.6%, a precision of 59.7%, and a recall of 65.1% for the multiclass modeling procedure.
Hanif et al. [26] presented an IDS for IoT networks using artificial neural networks (ANN). This system was implemented to overcome the issue of security that is a major concern in IoT networks. Given the fact that IoT devices often lack the capacity to perform high-level computation for security, the authors decided to explore the possibility of using an ML-based IDS system as the first line of defense. To assess the effectiveness of the proposed method, the authors utilized the UNSW-NB15. The experimental outcomes claimed that the ANN-IDS obtained a precision score of 84.00% for the binary classification process. However, the researchers did not provide much clarity on how the hyper-parameters of the ANN were tuned to arrive at their conclusion. Moreover, the authors did not consider any feature selection method.
In [20], the authors conducted a complexity comparison analysis between the UNSW-NB15 and the KDD99 datasets. To achieve the comparison, the authors used various methods, including the expectation-maximization (EM) clustering algorithm and the ANN methods. In this work, the models were assessed using the FAR and the accuracy. In the instance of the KDD99, the EM clustering achieved an accuracy of 78.06% and a FAR of 23.79%. In contrast for the UNSW-NB15, the EM clustering obtained a FAR of 23.79% and an accuracy of 78.47%. Furthermore, the ANN technique attained an accuracy of 81.34% and a FAR of 21.13% when tested on the UNSW-NB15. This research concluded that the UNSW-NB15 dataset is more complex in contrast to the KDD99 dataset.
Ketzaki [27] proposed a light-weight IDS using ANN. This system is destined to secure modern communication systems (5G networks, IIoT networks, etc.). The ANN-IDS presented in this research was designed in two stages. The first stage is the feature extraction procedure using statistical analysis. The second step is the classification process. The authors considered the binary classification scheme using the UNSW-NB15 intrusion detection dataset. The performance metric used to evaluate the ANN models is the accuracy that was obtained on the test data. The results demonstrated that the best model attained an accuracy score of 83.9%. In their future endeavor, the authors aimed to improve the effectiveness of the proposed method.
In [28], the author presented an IDS framework using the J48 tree-based classifier and the SVM algorithm. Several methods were used to conduct the feature selection process, including the GA, the firefly optimization (FFA), and the grey wolf optimizer (GWO). The researchers used the UNSW-NB15 dataset to gauge the effectiveness of the models implemented in the experiments. The results showed that the accuracy scores obtained by the GA-J48, GWO-J48, and the FFA-J48 are 86.874%, 85.676%, and 86.037%, respectively. Moreover, the accuracy scores achieved by the GA-SVM, GWO-SVM, and FFA-SVM are 86.387%, 84.485%, and 85.429%, respectively. Although these are impressive results using the J48 and the SVM methods, the authors recommended that future work be conducted using other approaches such as deep learning methods.
In [29], the researchers implemented a novel feature selection method named Tabu Search -Random Forest (TS-RF). TS-RF is a wrapper-based feature extraction technique in which the TS algorithm conducts the attributes search and the RF approach is used as the learning method. To verify the performance of their model, the authors considered the UNSW-NB15 dataset. The main performance metrics were the accuracy and the False Positive Rate (FPR). The results demonstrated that the TS-RF in conjunction with the RF classifier obtained an accuracy of 83.12% and an FPR of 3.7%. Although the obtained results are promising, the authors conceded that they did not consider the class imbalance problem found in the UNSW-NB15 dataset.
In [30], a Two-Stage (TS) model for IDS was proposed. This methodology used the first stage to detect minority classes of intrusions and the second step to detect majority classes of attacks. The ML classification method used in this work is the RF method. The authors used the Information Gain (IG) for feature extraction. The IG-TS IDS was evaluated using the UNSW-NB15 dataset. The performance metrics considered in this research are accuracy and FAR. In their experiments, the authors used the binary classification scheme as their main configuration. The experimental results showed that the IG-TS obtained a FAR of 15.64 % and an accuracy of 85.78 %. In future works, the authors aimed to change the classifier that was utilized in the two stages.
In [31], the authors proposed an ML-based IDS using the GA algorithm and the Logistic Regression (LR) method for attributes selection. The binary classification process was conducted using a Tree-based classifier, namely the C4.5 method. The UNSW-NB15 was used to assess the efficacy of the presented method. The authors considered a number of performance metrics to evaluate the proposed approach, however, the accuracy that was obtained on test data was the main metric. The experimental results showed that the GA-LR-DT attained an accuracy of 81.42%. This research did not demonstrate the effectiveness of the GA-LR-DT for the multiclass classification scheme.
Kasongo and Sun [32] proposed an IDS using an XGBoost (extreme gradient boosting) based feature extraction method in conjunction with several ML methods. The XGBoost, which is an ensemble-tree based algorithm, is used in this research to decrease the number of attributes in the UNSW-NB15. One of the classifiers used in this work is the LR method. The experimental results demonstrated that the XGBoost-LR achieved an accuracy of 75.51% and 72.53% for the binary and multiclass classification schemes, respectively. To overcome the class imbalance problems in the UNSW-NB15 dataset, the authors suggested using oversampling techniques.
In [33], the authors implemented an SVM-based NIDS using the UNSW-NB15 dataset. This system was designed to accommodate the unique nature of IoT networks. The authors considered the accuracy, the detection rate, and the false positive rate as the main performance metrics. The experiments were conducted for both the binary and multiclass classification schemes. The result showed that the SVM-NIDS attained an AC of 85.99% for the binary modeling task. In the instance of the multiple classes setting, the SVM-NIDS obtained an accuracy of 75.77%.
Kumar et al. [34] applied the UNSW-NB15 as an offline data source to design an ML-based IDS that would also be used to perform online intrusion detection. The authors used the Information Gain (IG) methodology for the feature selection procedure. The IG method selected 13 attributes. For the classification process, the researchers used an integrated approach that included the following Tree-based classifiers: C5, CHAID, CART, and QUEST. The outcome of the experiments demonstrated that the proposed system obtained an accuracy of 84.83% for the binary classification procedure. However, one of the drawbacks of the IDS presented here is its inability to detect unknown attacks. Solving this issue was one of the recommendations made by the authors.
In [35], the researchers presented an IDS using deep learning methods such as the Long-Short Term Memory (LTSM) RNN. To assess the effectiveness of the proposed approach, the authors used the UNSW-NB15 dataset. Moreover, the authors used the accuracy that was obtained during the classification task as the main performance metric. The experimental processes showed that the LSTM method obtained an accuracy of 85.42% for the binary modeling process. Although the authors claimed that these results were superior to existing ones, they did not consider implementing a feature selection algorithm.
Elijah et al. [36], proposed an ensemble and deep learning-based method for network intrusion detection. The LSTM algorithm was used to implement the deep learning model. The optimization algorithm applied to the LSTM is Stochastic Gradient Descent (SGD). The activation function applied in the LSTM layers is the Rectified Linear Unit (ReLU) in the instance of the binary classification task.
For the multiclass classification scheme, the authors used the Softmax function. The UNSW-NB15 dataset was used in order to evaluate the performance of the proposed approach. The experimental results show that the LSTM IDS achieved an accuracy of 80.72% for the two-way classification procedure. In contrast, the LSTM IDS obtained an accuracy of 72.26% for the multiclass classification tasks.
In [37], the authors proposed a deep learning-based IDS using deep neural networks. This model was built using a combination of residual blocks (ResBlk). The ResBlks contain convolutional neural networks (CNNs) and recurrent neural networks (RNN). Moreover, the authors utilized the NSL-KDD and the UNSW-NB15 dataset to assess the performance of the proposed approach. The accuracy was one of the main performance metrics that was used to evaluate the outcome of the experiments. The results showed that the DL method achieved an accuracy of 99.21% and 86.64% in the instance of NSL-KDD and UNSW-NB15 datasets, respectively. Although these results are promising, the authors conceded that more experiments need to be conducted to improve the current performance numbers.
Assiri [38] proposed a GA-RF-based method for anomaly classification. In this work, the authors used the GA for attributes and parameters selection and the RF method for classification. Moreover, the researchers considered the binary classification scheme. The UNSW-NB15 was one of the datasets used to assess the performance of their model. The accuracy, recall, and precision were the main performance metrics that were utilized to evaluate the GA-RF presented here. The experimental results demonstrated that the GA-RF achieved a classification accuracy of 86.70%, a recall of 87.00%, and a precision of 87%.
In [39], the authors implemented an advanced IDS. This system was designed using a multi-objective feature selection method based on a special variation of the GA in conjunction with the Logistic regression (LR) algorithm. The RF method was one of the ML methods that were used to assess the performance of the proposed methodology. The UNSW-NB15 was amongst the datasets that were employed to evaluate the models. The accuracy was the main performance metric that was considered to gauge the effectiveness of the GA-LR-RF. The experimental outcomes demonstrated that the GA-LR-RF achieved an accuracy of 64.23% for the multiclass classification task.

III. THE UNSW-NB15 DATASET
The UNSW-NB15 [19] is an advanced dataset used for IDS research and it is widely used in the literature. The raw packets (network traces) contained in the UNSW-NB15 dataset were generated by the IXIA PerfectStorm tool in a laboratory set-up of the Cyber Range Laboratory of the Australian Center for Cybersecurity (ACCS). The UNSW-NB15 contains 42 attributes listed in Table 1. As depicted in the list of attributes in Table 1; 3 features are categorical in nature and 39 attributes are numerical (binary, float and integer). The UNSW-NB15 is composed of two datasets that include the UNSW-NB15-train and the UNSW-NB15-test. In this paper, UNSW-NB15-train is further divided into two datasets. The first one is the UNSW-NB15-75 that makes up 75% of the full UNSW-NB15-train. The second one is the UNSW-NB15-25 that accounts for 25% of the UNSW-NB15train subset. In this study, UNSW-NB15-75 is used during the training phase of the models and the UNSW-NB15-25 is used during the validation phase of the models. It is crucial to perform a validation process to guarantee that the results that were obtained during the training phase are optimal. Moreover, the validation results must be like those of the training procedure. The entire UNSW-NB15-test dataset is used during the testing phase of the models presented in this research.

IV. THE PROPOSED IIoT IDS METHODOLOGY
The architecture of the proposed framework is depicted in Fig. 2 whereby there are three main phases, namely, the pre-processing phase, the feature selection phase, and the modeling and evaluation phases. In the pre-processing phase, we load the datasets (training set, validation set, and testing sets). Each dataset is cleaned and normalized. In the feature selection phase, the cleaned training dataset is used to compute the candidates feature vectors using the GA method in conjunction with the RF algorithm. In the modeling and evaluation step, the models (RF, EtraTrees, DT, LR, XGB) are trained using the cleaned training dataset with a particular attribute vector generated by the previous phase. Once the models have been trained, they are evaluated using the cleaned validation set and they are tested using the cleaned testing set. The building blocks of the proposed framework are explained in more detail in the next subsections.

A. PRE-PROCESSING PHASE
The most important aspects of the pre-processing phase are the cleaning and data normalization steps. Data cleaning is crucial because it ensures that the quality of the data used to build the models has been improved. The steps taken to clean the data include: removing duplicates, replacing missing data, fixing structural errors, and removing unwanted (potentially noisy) observations. Once, the data have been cleaned, they require normalization. In this research, we apply the Min-Max scaling [40] and it is defined as follows: where x represent a given feature in the feature space, X . This scaling process acts as a safeguarding process by squeezing the values of each feature within a certain range.

B. RANDOM FOREST
The building blocks of the Random Forest (RF) algorithm are Decision Trees (DTs). A DT is a supervised ML method that is applied in tasks such as regression and classification. In simple terms, a DT algorithm uses a tree-like structure to compute the predictions. Each DT contains three types of nodes: namely, the root node, the internal nodes, and the category nodes. For a given input vector, the DT computes its prediction from the root node, traversing many internal nodes, to the category nodes [41], [42].
In this research, we use an RF classifier in the fitness function of the GA algorithm described in the next section. The RF algorithm was devised by L. Breiman [43] and it is one of the most widely used ML algorithms today. The RF algorithm is an ensemble of Decision Trees (DTs) classifiers whereby each individual DT is built using an attribute vector that is randomly selected from the input vector. Finally, each DT casts a vote for the most popular label in the selected input attribute vector. The label (class) with the highest score wins the poll [44], [45]. The RF method can be formulated as follows: Let P = {X 1 , y 1 , . . . , (X k , y k )} be a training subset of inputs vectors and labels that are randomly selected given probability distribution (dataset), (X n , y n ) ∼ (X , Y ).
The aim is to compute a model (classifier) label y given an input X from P.
Let F, be a group of possibly weak classifiers defined as follows: F = {f 1 (X ), . . . , f N (X )} where N is the total number of models. Each model, f n (X ), in F is defined as a Decision Tree (DT). Therefore, F is called the Random Forest.
Each model f n (X ) has some parameters defined as B n = (β n1 , β n2 , . . . , β np ). The notation of each tree in the forest becomes: f n (X ) = f (X |B n ).
The attributes that appear in the nodes of the n th DT are randomly selected based on B n . The final result of the Forest, f (X ) (a combination of all the classifiers) is computed by majority voting. The label with the most votes is the output of the RF.

C. EXTRA-TREES
The Extra-Trees (ET) method is a tree-based algorithm (a meta-estimator) that is related to the RF algorithm because it also uses an ensemble of DTs to conduct the classification or the regression processes. However, unlike the RF algorithm, the ET approach randomly selects the nodes cut points. Therefore, the ET method adds another layer of randomization while maintaining its optimization capability [46].

D. FEATURE SELECTION PHASE USING GENETIC ALGORITHM
The Genetic Algorithm (GA) is an Evolutionary Algorithm (EA) that has gained popularity by solving various optimization problems with a low computational cost [47]. EAs are methods that are inspired by biological principles and are used for optimization or learning tasks. EAs have the following main traits [48]: • Population EAs methods conserves a group of candidate solutions labelled population.
• Fitness An individual is a solution within a population. Each individual possesses its code (Gene representation) and its fitness score.
• Variation The individual goes through changes (mutations) similar to the biological genetic gene variation. This is how an EA algorithm performs the search in the solution space. The main steps in the GA algorithm are as follows [49]:

1) Initialize the Population 2) Compute the fitness function 3) Perform the Selection 4) Perform the Crossover 5) Conduct the Mutation
In this research, the fitness function was implemented using the Random Forest algorithm presented in Algorithm 1.
Algorithm 2 depicts the steps (pseudo code) that were used to implement the GA algorithm on the UNSW-NB15 dataset. Moreover, Figure 3 simplifies this algorithm by outlining the major steps in a flowchart format.

E. MODELLING AND EVALUATION PHASE 1) PERFORMANCE METRICS
In this study, we used the following metrics to measure the performance of our proposed method: the accuracy (AC), the precision (PR), the recall (RC) and F1-Score (F1S) [50].

Algorithm 1 RF Algorithm in the GA Fitness Function
Input: X , y; the input dataframe and output series Output: AC; the Accuracy obtained by the RF model 1. Spilt X and y in X train , X val , y train , y val 2. Instantiate rf , the model. 3. Fit rf using X train and y train 4. Evaluate rf using X val 5. Compute predictions y predictions 6. Compute AC using y predictions and y train Algorithm 2 GA Algorithm Applied on the UNSW-NB15 Require: D, the UNSW-NB15 data-frame Require: F, an array that contains the feature names Require: T , the target value Require: L, an empty list to store the feature subset Require: mi, maximum iteration START 1. Initialize the population P, using F. 2. Implement the fitness function using RF 3. Compute the fitness using D, F, T and P 4. Compute optimal fitness value, v 5. Update L for i in range(mi) 6. Implement crossover 7. Run mutations 8. Compute the fitness 9. Compute optimal fitness value, v 10. Update L end for 11. Convergence reached L and v

STOP
The F1S represents the harmonic mean of the PR and RC. These metrics are chosen on the basis that we are faced with a classification problem. Moreover, in this research, we implement binary and multiclass classification processes. The AC, the RC, the PR, and the F1S are computed as follows: where each component in the above equations is defined as follows: • True Positive (TP): represents the intrusions that are correctly labelled as attacks.
• True Negative (TN): normal network traces that are correctly labelled as legitimate.
• False Positive (FP): normal network traces that are labelled as intrusions.
• False Negative (FN): network intrusions that are wrongly labelled as non-intrusive (normal). Additionally, to verify the efficacy of pour proposed method, we also plotted the receiver operating characteristic curve (ROC) curves for the models. The ROC curve plots the True Positive Rate (TPR) vs. the False Positive Rate (FPR) of a given model. The area under the ROC curve is defined as the Area Under the Curve (AUC). The value of the AUC is always between 0 and 1. An efficient model has an AUC value closer to 1 [51].

V. EXPERIMENTS AND DISCUSSIONS A. EXPERIMENTAL CONFIGURATION
In this research, the experiments were conducted on a Laptop with the following specifications: DELL 153000 series Windows 10 OS, Intel Core i7-8568U-CPU, 1.8GHz -1.99 GHz. The ML framework that was used to implement the simulations is the Scikit-Learn (a Python-based framework) [52].

B. EXPERIMENTAL RESULTS AND DISCUSSIONS
In this research, the experiments were conducted in two phases (phase 1 and phase 2). In phase 1, we implemented the GA algorithm on the UNSW-NB15 dataset. This process generated two sets of feature vectors: V b and V m .
where V b the group of possible solutions generated by the GA for the binary classification scheme and V m denotes the group of possible solutions generated by the GA for the multiclass classification process. Table 3 and Table 4 provide the details about the vectors in V b and V m . These tables have three columns whereby the first one shows the vector name, the second column specifies the number of features that are present in the feature vector and the third column provides a list of features (attributes) that were selected by the GA.
In the second phase of our experiments, we implemented two classification processes. We first conducted the binary classification process whereby the target feature was binary (Normal or Attack). In this step, we considered all the feature vectors in V b . We used the Logistic Regression (LR) [53] as our baseline model and we implemented the following Tree-based methods: DT, RF, ET, and XGB. The baseline model was used as our point of departure and the aim was to beat its performance using the other classifiers. The results of the experiments are presented in Table 5 -14. The most optimal test accuracy (TAC), 87.61%, was achieved by the RF method using f 3 , as shown in Table 7. Moreover, this model obtained a validation accuracy (VAC) of 95.87%, a recall (RC) of 98.34%, a precision (PR) of 82.51%, and an F1-score (F1S) of 89.73%. Moreover, for each of the classifiers that were evaluated using f 3 , we computed the ROC curves. The results are depicted in Figure 3 whereby the RF achieved an AUC = 0.98. This value demonstrates that the quality of classification yielded by the RF is high. Although the TAC obtained by the XGB method (Table 7) was lower than that of the RF approach, it yielded an AUC = 0.98. This shows that the classification quality of the XGB classifier is high. Both the RF and the ET surpassed the AUC = 0.895 of the VLSTM presented in [23].
In the second step of phase 2, we implemented the multiclass classification process whereby all the labels (10 classes) present in the UNSW-NB15 were considered. Moreover, in this step, we utilized all the attribute vectors in V m . The Naïve Bayes (NB) classifier [54] was used as the baseline model and we further implemented the following Tree-based algorithms: DT, RF, ET, and XGB. As mentioned in the previous step, the baseline model was utilized as our starting point and the goal was to surpass its performance using the other models. The outcomes are shown in Table 15 -21. As depicted in Table 19, the experimental results demonstrated that the best model was the ET using g 5     for each class present in the UNSW-NB15. As depicted in Figure 4, the ET performed optimally in detecting the      following classes: Normal, Generic, Exploits, Dos, Reconnaissance, and Shellcode. However, the ET underperformed for some minority classes such as Worms, Backdoor, and Analysis.
Furthermore, we conducted a comparative analysis in Table 22. This analysis showed that the results yielded by the methodologies presented in this paper are superior to existing frameworks. For instance, in the case of binary classification, the TAC obtained by the GA-RF-f 3     (proposed in this work) was 11.51% higher than the work presented in [25], 12.1% higher than the method in [32] and 3.71% greater than TAC obtained in [26]. In the case of the multiclass classification process, the GA-ET-g5 obtained a TAC that is 5.11% greater than the TAC obtained in [32] and 1.87% higher than the TAC obtained in [33]. Furthermore, the methods that were proposed in this research were     superior to the DL-based algorithms that were reviewed in the literature. For instance, the GA-RF achieved a TAC that is 2.19% higher than the TAC obtained by the LSTM method in [35]. In comparison to the TAC obtained by the LSTM approach in [36], the GA-RF attained a TAC that is 6.89% higher. Additionally, the GA-RF achieved a higher TAC in comparison to the CNN-RNN presented in [37]. Additionally, the GA-RF presented in this paper achieved an accuracy that is superior to existing research. For instance, for the two-way classification task, it achieved a TAC that is 0.9% higher than the performance obtained by the GA-RF in [38]. For the multiclass classification procedure, it obtained an accuracy that is 13.34% higher than the score obtained by the GA-RF in [38].   Moreover, a performance analysis of prediction time was conducted between different models that used the most optimal feature vectors. In the instance of the binary classification, the vector that yielded the most optimal TAC is f 3 . The graph in Figure 6 shows that the DT model is the most efficient method in terms of prediction time (18.3 milliseconds) when using f 3 . For the multiclass classification process, the vector that achieved the highest TAC is g 5 . The plot in Figure 7 demonstrates that the NB (7.96 milliseconds) method was the most efficient one in terms of prediction time when utilizing g 5 . However, the NB did not obtain a satisfactory TAC.

VI. CONCLUSION
In this research, an advanced IDS system for IIoT was proposed and it was evaluated using the UNSW-NB15 dataset. This IDS was designed using multiple stages. The first stage involved implementing the GA algorithm in conjunction with the RF model to select the most important features to be used by the classifiers. This stage generated two sets of feature vectors. The first feature set, V b , included 10 feature vectors destined for the binary classification procedure. The second feature set, V m , contained 7 feature vectors that were used for the multiclass modeling process. For the binary classification experiments, the LR algorithm was applied as the baseline model and the following Tree-based models were implemented: DT, RF, ET, and XGB. For the multiclass modeling process, the NB was used as the baseline model alongside the same Tree-based algorithms that were implemented for the binary intrusion detection procedure. The results demonstrated that for the binary classification process, the GA-RF achieved a TAC of 87.61% and an AUC of 0.98 using f 3 that contained 16 features. When modeling for the multiclass classification, the outcomes showed that the GA-ET got a TAC of 77.64% using g 5 that contained 17 attributes. The results achieved by the methods proposed in this study were superior in comparison to those achieved by the existing methodologies. In future work, we intend to pair the GA algorithm with models such as the SVM or ANN. We also aim to increase the performance of our proposed approach on the minority classes of the UNSW-NB15. Furthermore, we intend to implement the proposed methodology on the TON_IoT. This dataset contains traffic patterns that have been mainly generated by IIoT devices. Additionally, we intend to conduct a performance analysis of the method proposed in this paper across multiple datasets including the NSL-KDD and the AWID.