An Ensemble Deep Learning-Based Cyber-Attack Detection in Industrial Control System

The integration of communication networks and the Internet of Things (IoT) in Industrial Control Systems (ICSs) increases their vulnerability towards cyber-attacks, causing devastating outcomes. Traditional Intrusion Detection Systems (IDSs), which are mainly developed to support information technology systems, count vastly on predefined models and are trained mostly on specific cyber-attacks. Besides, most IDSs do not consider the imbalanced nature of ICS datasets, thereby suffering from low accuracy and high false-positive when being put to use. In this paper, we propose a deep learning model to construct new balanced representations of the imbalanced datasets. The new representations are fed into an ensemble deep learning attack detection model specifically designed for an ICS environment. The proposed attack detection model leverages Deep Neural Network (DNN) and Decision Tree (DT) classifiers to detect cyber-attacks from the new representations. The performance of the proposed model is evaluated based on 10-fold cross-validation on two real ICS datasets. The results show that the proposed method outperforms conventional classifiers, including Random Forest (RF), DNN, and AdaBoost, as well as recent existing models in the literature. The proposed approach is a generalized technique, which can be implemented in existing ICS infrastructures with minimum effort.


I. INTRODUCTION
Critical infrastructures are highly complex systems that utilize cyber and physical components in their daily operations.The backbone of these facilities consists of an Industrial Control System..(ICS), which plays an important role in the monitoring and control of critical infrastructures such as smart power grids, oil and gas, aerospace, and transportation [1] [2].Therefore, the safety and security of ICSs are paramount for national security.
The inclusion of the Internet of Things (IoT) in ICSs opens up opportunities for cybercriminals to leverage the system vulnerabilities towards launching cyber-attacks [3] [4].Awareness of the cyber-security vulnerability in ICSs has been growing since Stuxnet, the first cyber-attack that specifically targeted these technologies, revealed in 2010.Stuxnet intended to sabotage the system's operation without disturbing Information Technology (IT) systems [5].In 2015, another cyberattack by the name of Black-Energy was used to target Ukraine's power grids, causing a massive power outage that affected about 230,000 people [6].In February 2020, three U.S. gas pipeline firms announced another cyber-attack alleging a shutdown of electronic communication systems for multiple days [7].While some of these attacks may result in information leakage, others can damage the physical system or misrepresent the system state to the monitoring engineer.These examples emphasize the growing cyber threat on Operational Technology (OT), which runs much of the enabling computer technologies that ICS in critical infrastructure (i.e., power, gas, and water), now rely on [2] [8].
While the security concerns of critical infrastructure facilities are already considered in the IT community, limited efforts have been made to develop security solutions that are specific to ICSs and OT environments [9].Due to the differences between the nature and characteristics of IT and OT systems, these attacks mostly remain invisible to the traditional IT security measures such as Intrusion Detection Systems (IDSs) and anti-virus programs.Also, the communication protocols used by ICS (e.g., Modbus or A. Al-Abassi, H. Karimipour, A. Dehghantanha, Reza M. Parizi "An Ensemble Deep Learning-based Cyber-Attack Detection in Industrial Control System", IEEE Access, pp.1-10, April 2020 DNP3 [10] and IEC standards [11]) are not adequately secured by traditional IDS.Therefore, strong security mechanisms are required to be designed explicitly for OT environments and ICSs to defend such attacks and to protect critical infrastructure facilities.
Different frameworks for IDSs have been used in the literature, such as model-based [12], and learning-based approaches [13] [14].Most of these techniques utilize the available data to develop a model that exhibits the normal behavior of the system, then identify all different behaviors as abnormal.Since these methods are only trained on specific types of attacks, they are not able to detect unseen or new attack types [15] [16].Besides, current IDSs are.customized for specific systems/protocols, which lack adequate generalization [18].
Most importantly, the existing literature does not consider the imbalanced nature of ICS datasets, which results in low detection rates or high false positive in real scenarios [17].A dataset is imbalanced if the instances of some classes are far fewer than other classes.The fundamental principle of classification is finding the boundary between different classes.If some classes are rarely presented, they may not be able to provide enough information to determine the boundary.Therefore, they may be treated as outliers resulting in wrong classifications.
Confronting these concerns, in this paper, we propose a generalized ensemble deep learning method for cyber-attack detection in ICS, which is evaluated on different real ICS datasets.The proposed deep learning model consists of multiple unsupervised Stacked Autoencoders (SAE) that learn new representations from imbalanced datasets.Then, new representations from each SAE are passed to a Deep Neural Network (DNN) via super vector and concatenated using a fusion activation vector.Finally, a Decision Tree (DT) is used, as a binary classifier, to detect attacks from the newly merged representations.Experiments show that the proposed model outperforms existing approaches with an acceptable performance even though fewer malicious instances are used.
The main contributions of the proposed method can be listed as follows: • Developing a deep representation learning model to construct new balanced representations.The new representations increased attack detection accuracy and robustness (f-score) in an imbalanced environment.
• Increasing the detection accuracy and reducing the false positive rate by developing an ensemble deep learning algorithm based on DNN and DT classifiers to detect cyber-attacks from the new representations.
• Developing a generalized model that can be used in different critical infrastructure facilities with minimum changes in the existing system.The proposed framework utilizes representation learning and ensemble methods that can be trained to detect cyberattacks in ICSs regardless of the data imbalance ratio.
The rest of this paper is structured as follows.Section II gives a literature review of recent studies in the field of ICS security.Section III presents a brief overview of the general ICS structure, system model, and different attack models considered in this work.The proposed method is described in Section VI.Section V includes results and case studies followed by the concluding remarks in Section VI.

II. Related Work
Traditionally, ICSs were in an isolated environment with the focus on safety, where each system is safeguarded to stop the process if something goes wrong.However, the introduction of Internet protocols, IoT devices, and wireless technologies within ICSs has resulted in significantly less isolation from the outside world.Consequently, safety mechanisms, which were not designed to deal with malicious attacks, face more vulnerabilities than ever before.
The majority of current existing techniques on cyberattack detection in ICSs are based on traditional IDSs, which are mainly designed for IT security analysis [5] [17].IDSs can be categorized as signature-based and learning-based techniques.Signature-based approaches use databases and fixed signatures to detect known attacks, rendering them inefficient in detecting unknown or new attacks [19].On the other hand, learning-based systems aim to identify process trends or behaviors that increase the efficiency to manage unexpected intrusions [20].[21] used a common-path mining method for anomaly detection in smart cyber-physical grids.An attack detection technique based on the Pearson correlation between two sensor parameters was used in [22].Authors in [23] utilized an IDS based on the Gaussian process to the attack strategy for anomaly detection.While these approaches are effective in detecting unusual activates, they are not reliable due to frequent upgrades in the network, resulting in different IDS topologies.
In contrast, learning-based IDSs are designed based on a moving target to continually evolve and learn new vulnerabilities [24] [25].These methods try to generate the normal behavior of the system using existing datasets, then identify the irregular pattern as abnormalities.The authors of [26] proposed an anomaly detection technique based on reinforcement learning and convolutional autoencoders for ICS.Alternatively, [27] addresses the detection of Denial of Service (DoS) attacks using Support Vector Machine (SVM) and RF.[28] suggested an unsupervised technique for the effective detection of privacy attacks based on observations of eavesdropping attacks.[29] uses a variety of DNN methods, including different variants of convolutional and recurrent networks for cyber-attack detection in water treatment facilities.An ICS anomaly detection method using Long Short-term Memory (LSTM) networks is proposed in [30].The authors of [31] proposed an attack detection techniques based on Hierarchical Neural Network.Similarly, [32] proposed a deep learning-based IDS through utilizing Recurrent Neural Networks (RNNs).
In another study [33], the authors applied a stacked Nonsymmetric Deep Autoencoder (NDAE) to develop their IDS.[34] proposed an unauthorized intrusion detection technique and conducted backdoor attacks on a SCADA Industrial Internet of Things (IIoT) testbed.[35] proposed a graphical model-based approach for detecting abnormal behavior in an ICS using Bayesian networks to map the relationship between sensors and actuators.[36] implemented a toolchain with multiple state-of-the-art Anomaly Detection (AD) techniques used for detecting attacks that appear as anomalies.Their findings suggest that detection rates can change dramatically when considering different detection modes, thereby necessitating a reliable and real-time AD technique to maintain resilience in critical infrastructures.[37] proposes a genetic algorithm (GA) to find the best NN architecture for a given dataset, using the NAB metric to determine the consistency and quality of different architectures.[38] evaluates the application of unsupervised machine learning algorithms, including DNN and SVM, to detect anomalies in the Cyber-Physical System (CPS) using data from a Secure Water Treatment (SWaT)...testbed.Results indicate that the DNN classifier results in less false positives when compared to the one-class SVM, while SVM can detect more anomalies.
Although the above-mentioned works addressed some of the issues related to cyber-attack detection in ICSs, most of them are heavily reliant on feature engineering.These methods are quite complicated and require sophisticated learning techniques, which can potentially increase their computational burden.Furthermore, the majority…of current proposed…techniques are.evaluated using balanced datasets, which lack the standard representation of imbalanced data in the ICS environment.Thus, it is hard to deploy such algorithms as they cannot extract various discriminative information from real-world imbalanced datasets.As such, in this paper, we propose a deep learningbased attack detection technique, which extracts a new representation from raw imbalanced datasets, for reliable and accurate attack detection with a low false-positive rate in highly imbalanced datasets from ICS environments.

A. Industrial Control Systems
A typical ICS network in a SCADA system architecture, as shown in Figure 1, consists mainly of a remote station, primary center, and regional center.These systems can interact with each other via wide/local area networks or Radio Telemetry.The primary center gathers data from field sensors, identifies new setpoints to track the operations of the network, and detects any existing irregularities.Then, instructions are sent to the remote station to monitor telemetry from field devices [39].The regional station manages the network communication and regional power consumption between the primary and remote stations.
ICS can be modeled using non-linear and non-Gaussian processes through the following equations: where the state of the system is denoted by x k ∈ ℝ n at time k.Sensor measurements are denoted by y k ∈ ℝ m .The process and sensor noise are denoted by ω k and υ k respectively.

B. Adversary Model
The main attack types addressed in this study involve integrity attacks, such as False Data Injection (FDI) and availability attacks, such as DoS.In FDI attacks, an attacker executes the attack by injecting false data into the system shown in the equation below: where  ̃ denotes the observation,  is the true sensor measurement,  is the element-wise multiplication,   is the measurement noise, and  is the sensor-selection vector described below: where the node  is chosen as a malicious node and   is equal to 1.Typically, the intruder can exploit up to  of  sensors to fully inject false data into the system.
On the other hand, DoS attacks include measurement (packet) loss with two main types of modeling, including

FIGURE1. ICS Standard Operation analysis model
Bernoulli distribution [40] and Markov model [41].The attacker usually initiates DoS attacks by manipulating sensor readings and jamming communication channels, thereby flooding packets in the network [42].This is illustrated below: where   ∈ ℝ is the measurement vector state matrix and   () denotes the element  in the state transmission matrix.Consequently, measurement data received under DoS attacks by the state estimator can be expressed in the following matrix: where   ̀ is equal to the measurement data gathered from DoS attacks.

IV. Proposed Method
To overcome some of the issues associated with existing approaches, in this section, we propose a generalized deep learning model that works with raw imbalanced datasets.Then, new representations from each SAE are passed to a DNN via super vector and concatenated using a fusion activation vector.Finally, a DT is used, as a binary classifier, to detect attacks from the newly merged representations.The schematic of the proposed model is presented in Figure 2.

A. The Proposed Ensemble Deep Representation Learning Model
Most existing approaches proposed in literature neglect the fact that real ICSs are highly imbalanced (the number of attack samples is a lot less than the number of normal samples).This will result in a low f-measure, which reflects the low performance of these models in an imbalanced environment like ICSs, thereby makes them impractical for real-world use cases.
Once a model is directly trained with a highly imbalanced dataset, the new malicious data are likely to be misclassified.To address this problem, we propose an ensemble deep representation-learning model based on SAE to enhance the overall performance of the model.This is done through extracting an equal balanced set and passing it to multiple AE to generate new representations.The input sample   in a sample set  corresponding to the hidden layer is represented in the following equation: where W and b represent the weight matrix of neurons and bias vector of all neurons between the input and hidden layers, respectively [43]. is a function of the hidden layer used after beginning the training process by updating the next input layer to construct a set of stacked multi-layer AEs.Although using an ensemble model has increased the computational efficiency by a little, it was evident that utilizing multiple AE would lead to much better f-measure scores.
To enhance the performance of each AE, a dropout layer is added to enhance the generalization of our model by reducing the reliance of the output on a specific set of parameters.Also, the number of nodes and layers was selected through cross-validation of various networks with critical analysis of loss history and validation accuracy.Binary Cross-Entropy (BCE) is used as the cost function, represented by: where  1 and  2 represent attack and normal samples, respectively. is the total number of samples, and () is the expected likelihood of an attack sample.BCE was used over Mean Squared Error (MSE) to prevent neuron weight changes in the hidden layer of the AE from getting smaller and smaller, thereby stalling out the system.

B. The Proposed Ensemble Deep Learning Attack Detection Model
Once the new representations are generated form the imbalanced dataset, they are fed to an ensemble of DNN classifiers to detect normal from abnormal behaviors.The results from each DNN is then concatenated, via super vector using a fusion activation function, and passed on to a DT classifier to detect attacks from the newly merged representations.A DT classifier was selected based on multiple tests using different machine learning classifiers, with DT providing the best performance results.The fusion activation function of the sigmoid layer is represented by the following equation: where  1 is the fusion activation function of the sigmoid layer,   is the label of  − ℎ sample,   is the prediction of the i-th sample.  and   are weights of unstable and stable samples, respectively.  is set larger than   to improve the detection of unstable samples, and   is always set to 1 as a benchmark to mine unstable patterns effectively [33].
The AEs were tested in a for loop, using a different number of layers, neurons, batch sizes, loss and activation functions, optimizers, epochs, and dropout layers, to achieve better accuracy and f-measure.Both SAE and DNN utilize BCE cost function as well as Rectified Linear Unit (ReLU) activation function to achieve best performance measures, represented by: where  is the observation.
The pseudocode of the proposed attack detection algorithm is shown in Algorithm 1.

A. Data Preparation
Ideally, using new real SCADA data should be appraised, but due to the limitations of available real datasets, this study resorted to realistic ICS datasets obtained in 2015 and 2018.In this section, two different ICS datasets are used to evaluate the performance of the proposed algorism's efficiency against random ICS models.
• Gas Pipeline (GP): This dataset is obtained from a gas pipeline system and contains a Modbus validation frame of a preprocessed dataset in an Attribute-Relation File Format (ARFF) to help researchers use specialized preprocessing techniques.It also has a deep packet inspection of the Modbus frame with each line representing one network transaction.The dataset contains 17 features, with a total of 274628 observations split into 219702 (80%) samples for training and 54925 (20%) for testing [44].
• Secure Water Treatment (SWaT): This dataset includes 11 days of continuous operation, in which 7 days were recorded under normal operation conditions and 4 days with attack scenarios.SWaT contains a total of 51 features, collected from network traffic ports, sensors, and actuators, with a total of 1048576 observations split into 838860 (80%) samples for training and 209715 (20%) for testing [45].

B. Evaluation Metrics
When it comes to the security of ICSs, the concern revolves around detecting cyber-attacks while achieving high f1-scores on imbalanced datasets, thereby minimizing the rate of false alarms.As with standard machine learning benchmarking metrics, this work considers True Positives (TP), False Positives (FP), True Negatives (TN), and False Negatives (FN), defined in Table I, as the performance evaluation metrics for the attack detection models.

FIGURE 2. Overview Model of Stacked Autoencoder Algorithm
The performance of the machine learning algorithms is measured by the following metrics [44]: • Accuracy: Ratio of samples classified correctly over the entire dataset.
• Recall: The ratio of correctly predicted positive samples over the total samples of the corresponding class.
• F1 Score: Harmonic mean of precision and recall.
The F1-score aims to find an equal balance between precision and recall, which is highly important in performance evaluation for imbalanced datasets (i.e., the number of attack samples are a lot less than the number of normal samples).

C. Performance Analysis
General Performance Analysis-In this section, two different ICS datasets gathered from a gas pipeline system and a water treatment facility were used to evaluate the performance of the proposed method.Results were compared with DNN, RF, DT, and Adaboost based classifiers along with multiple peer approaches in the current literature.Tables II and III provide a summary of performance evaluation metrics results, including accuracy, precision, recall, and F1-score.As illustrated, the results of the proposed method, in both datasets, outperform existing techniques in all four metrics, and most importantly on f-measure, which highlight the efficiency of the proposed model in imbalanced ICS environments.
Imbalanced Testing-To evaluate the efficiency of the proposed method under different imbalanced conditions, we have tested the model with different imbalanced ratios.Imbalanced ratio of 0.1 means 10% of the attack samples were used, and in the same way, an imbalanced ratio of 1 means a %100 is utilized.As shown in Figures 3-6, results of the proposed method exceed other techniques with a flat curve in all metrics for the GP dataset.This verifies the robustness of the proposed method as its performance is not affected by different imbalanced ratios.Although other methods have an acceptable accuracy, the recall and precision are significantly lower than that of the proposed method.However, our proposed method maintains consistent results in all four metrics.For further analysis, the proposed model was evaluated on the SWaT dataset, too.Since the model is generalized for different ICS environment, the proposed model was tested without any modification on the model structure or parameters.As illustrated in Figure 7-10, the proposed method outperforms existing techniques in all four metrics.Better performance compared to the first case study could be attributed to the fact that there are more samples for training in the SWaT dataset than what exists in the GP dataset.

V. CONCLUSION
Critical infrastructures are complex cyber and physical systems that structure the lifeline of modern society, and their reliable and secure operations are essential to national security.In this paper, we proposed a generalized ensemble deep learning-based cyber-attack detection method specifically designed for ICS.The proposed technique includes a deep representation-learning model, which constructs new balanced representations from the raw imbalanced dataset.The new representations are then used in an ensemble deep learning algorithm based on DNN and DT classifiers to detect cyber-attacks.The performance of the proposed model is verified using two different ICS datasets obtained from real critical infrastructure facilities.Our proposed approach outperformed conventional classifiers with %10 higher f1-score in both datasets evaluated and produced higher accuracy, with %95.86 for the Gas Pipeline dataset and %99.67 for the Secure Water Treatment dataset.
Results were compared with traditional classifiers, such as RF, DNN, and ADA, along with multiple peer proposed approaches in the current literature.The proposed approach outperformed other techniques in all four-evaluation metrics.
Although our approach performed better than existing techniques, there is room for improvement when dealing with few samples, as illustrated in the GP dataset.Additionally, identifying the attack type and its location is also very important to prevent processing downtime and computation efficiency once an attack is detected.Therefore, our future work will focus on optimizing the accuracy of the proposed method and developing an additional model to identify different attack types and their locations.This will avoid critical system failure and improve the network security of ICSs against similar cyber-attacks.

Algorithm 1 :
The proposed ensemble attack detection SAE model Data: Input all datasets including Normal and Attack samples Training Phase: for 10 folds of cross-validation do Split the dataset into Training (80%) and Testing (20%) sets Normalized the data:  = −()) max()−min () Separate the samples into four balanced sets with each containing (50 % Normal, 50% Attack) samples.Training the SAE model Feed each balanced set to the SAE model to generate new representations of data for number of epochs do for number of batches in the balanced set 1 do Train the autoencoder: min ℒ (  ,  ̂) Loss function: Binary Cross Entropy (BCE), Optimizer: Adam end for number of batches in the balanced set 2 do Train the autoencoder: min ℒ (  ,  ̂) Loss function: BCE, Optimizer: Adam end for number of batches in the balanced set 3 do Train the autoencoder: min ℒ (  ,  ̂) Loss function: BCE, Optimizer: Adam end for number of batches in the balanced set 4 do Train the autoencoder: min ℒ (  ,  ̂) Loss function: BCE, Optimizer: Adam end The new representations sets are then used to train four DNN models for anomaly detection Training the ensemble DNN detection model: Train 4 DNN models, each corresponding to the 4 new representation sets for number of estimators do Train the DNN model on set 1 Loss function: BCE, Optimizer: Adam end for number of estimators do Train the DNN model on set 2 Loss function: BCE, Optimizer: Adam end for number of estimators do Train the DNN model on set 3 Loss function: BCE, Optimizer: Adam end for number of estimators do Train the DNN model on set 4 Loss function: BCE, Optimizer: Adam end end end Fusion Layer: Merge the new representations from each DNN to form a Super-vector using the NumPy concatenating function Pass the super vector to a final DT binary detection model Training the DT model: for number of estimators do Train DT classifier on new merged representation end Testing Phase: Normalize the test sample Pass 4 test sets through the SAEs Pass each new generated representation from SAE to DNN for anomaly detection Fuse the output of the DNN into a super vector Pass the super-vector to a DT model for binary classification Output: Normal/Attack label

TABLE III SUMMARY
OF THE RESULTS AND PERFORMANCE COMPARISON ON THE SWAT DATASETS

TABLE II SUMMARY
OF THE RESULTS AND PERFORMANCE COMPARISON ON THE GAS PIPELINE DATASETS