A Dual-Isolation-Forests-Based Attack Detection Framework for Industrial Control Systems

The cybersecurity of industrial control systems (ICSs) is becoming increasingly critical under the current advancement in the cyber activity and the Internet of Things (IoT) technologies, and their direct impact on several life aspects such as safety, economy, and security. This paper presents a novel semi-supervised dual isolation forests-based (DIF) attack detection system that has been developed using the normal process operation data only and is demonstrated on a scale-down ICS known as the Secure Water Treatment (SWaT) testbed and the Water Distribution (WADI) testbed. The proposed cyber-attack detection framework is composed of two isolation forest models that are trained independently using the normalized raw data and a pre-processed version of the data using Principal Component Analysis (PCA), respectively, to detect attacks by separating-away anomalies. The performance of the proposed method is compared with the previous works, and it demonstrates improvements in terms of the attack detection capability, computational requirements, and applicability to high dimensional systems.


I. INTRODUCTION
Industrial control systems (ICSs) are composed of electrical and mechanical devices, computers, and manual operations supervised by humans. They are mainly used for partial or full automation control in industrial plants and critical infrastructures such as manufacturing industries, chemical plants, power generation and distribution systems, water treatment plants, and others [1]. Their operation has a direct impact on the environment, the safety and health of people, the economy, and national security. Concerns about the security of industrial control systems are increasing, given the growing sophistication of cyber activities. The advancement in the industrial Internet of Things (IoT) technologies is creating more potential threat points and vulnerabilities in the system. There have been a number of cyber-attacks on critical infrastructures in the past few years [2]- [4], and research in cybersecurity of industrial control systems has been evolving The associate editor coordinating the review of this manuscript and approving it for publication was Ana Lucila Sandoval Orozco.
to overcome the challenges and vulnerabilities in the current industrial attack detection systems.
Attack detection systems are designed to monitor the events taking place in an information system in order to identify signs of security issues. Anomaly detection is the most commonly used approach for attack detection, which is the process of identifying anomalous events that do not conform to the expected behavior of the system. The main underlying advantage of the anomaly detection approach is its ability to detect unseen and new attacks. Anomaly detectionbased attack detection approaches can be implemented using a variety of Machine Learning (ML) algorithms such as Support Vector Machine (SVM) [5], [6], Principal Component Analysis (PCA) [7], Neural Networks [8], clustering analysis [9], Negative Selection Algorithm (NSA) [10], and others. They can be divided into unsupervised, supervised, and semisupervised learning approaches. In the unsupervised method, the model is developed using unlabeled data that contain normal and anomalous samples, while the labeled normal and attack data are used in the supervised learning scheme. VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see http://creativecommons.org/licenses/by/4.0/ However, in the semi-supervised approach, the model is developed using the normal operation data only. The work presented in this paper is demonstrated using the datasets obtained from the iTrust Lab testbeds, which are the Secure Water Treatment (SWaT) testbed and the Water Distribution (WADI) testbed. There have been several works in attack detection using the SWaT dataset as in [10]- [22] and limited work using the WADI dataset as in [17]. Most of the previous works on attack detection utilized the normal process data using several ML algorithms such as Negative Selection Algorithm (NSA) [10], Singular Value Decomposition (SVD) [11], Standard Neural Networks (NNs) [12], [13], Convolutional Neural Networks (CNNs) [14], Recurrent Neural Networks (RNNs) [15], [16], and Generative Adversarial Network (GAN) implemented using the Long-Short-Term Memory (LSTM) network [17]. They are based on constructing a model that is able to profile normal system behavior, and then non-conforming observations are identified as anomalies. In [18], an attack detection approach is proposed based on a graphical model developed using a probabilistic deterministic real-time automaton model and a Bayesian network, named as the Time Automata and Bayesian netwORk (TABOR) approach. In [19], supervised learning is used to develop a detection model using SVM. A network-based attack detection system is proposed in [20] to detect attacks in particular communication links in the SWaT testbed. In addition, model-based attack detection methods are proposed in [21], [22] for the SWaT system using approximated discrete models in which invariants are derived from process dynamics and state entanglement among the physical components, to detect attacks.
From the computational overhead aspect, model-based approaches are considered relatively more efficient than datadriven ones for large-sized systems [23]. In addition, the computational complexity differs among the different Machine Learning algorithms as it is well known that CNNs and RNNs involve extensive computations in both training and evaluation phases, while NNs have less computational requirement ranging from average to high [24]. Comparatively, standard ML algorithms such as SVD, PCA, SVM, NSA, etc. are characterized by their low to average computational complexity depending on the problem size [25], [26].
However, model-based approaches in [21], [22] have some limitations such as modeling approximations given the complexity associated with some processes in the system (i.e., the chemical processes, etc.), which affect the detection accuracy. Nevertheless, the difficulty, effort, and time requirements for the system modeling rise with the increase in the complexity of the system, and the reliability of the detection approach is likely to degrade. Even though in [21] the authors proposed an approach for analyzing the security matter of the SWaT testbed such as the vulnerabilities of the system and the possible attack scenarios that can be discovered, the possibility of using this approach in launching attacks that cannot be detected by other approaches, specifically the datadriven methods, depends on the quality of the used system models. In addition, developing high-fidelity system models becomes more challenging as the complexity, the size, and the non-linearity of the system increase.
Methods proposed in [10]- [13], [15], [16] might have the drawbacks of high missed alarm rate and poor performance for high dimensional data. In addition, some approaches have high computational cost such as in [14]- [17], and others, e.g., [10], [11] do not make full use of the process information by disregarding the actuator signals that may contain valuable input about the process status. In addition, the approach proposed in [18] requires that the variables selection must be made manually and empirically by the designer based on the dynamic behavior. The disadvantage of the supervised learning-based attack detection system proposed in [19] is its dependency on the attack data-which are scarce -and the low accuracy of the detection model under new and unseen attack scenarios. TABLE 1 presents a summary of the previous works done using the SWaT and WADI datasets for intrusion and attack detection.
In this paper, we present a dual isolation forestsbased (DIF) attack detection framework for industrial control systems in water treatment plants. The two isolation forest models are trained independently, one using the normalized raw data and the other using a pre-processed version of the data using PCA. The idea behind using two models is to inspect the data in two representations; one in the original data space and the other in the principal component space, thus, elevating the capability of the detection approach. Its main objective is to address the limitations of the previous works given that isolation forests have low computational complexity and high applicability to complex and high dimensional data. They can be used on mixed datasets-containing continuous and discrete variables-that facilitates harnessing the available data when developing the model. They can be used in both semi-supervised and unsupervised learning schemes. Unlike most of the previous works, they are based on pointing out anomalies using the concept of isolation, which improves the attack detection capability. There have been a couple of implementations of isolation forest-based approaches for attack detection, such as in [27] for smart grid networks and in [28] for information security.
The contributions of this work can be summarized as follows: 1) A dual-isolation forests-based attack detection framework is proposed for industrial control systems in water treatment plants utilizing the normal process data of actuator signals and sensor measurements. 2) The proposed approach is based on the principle of separating-away observations that are anomalous, which improves its ability to detect attacks. 3) Due to the nature of the isolation forest, it can harness the available information about the process by analyzing the relations between the different system variables, which are the sensor measurements and the actuator signals. 4) It can exploit the available data of the system by learning from the process data in the original, as well as the PCA-transformed representations. 5) It provides an efficient solution in terms of computational complexity when compared to Deep Learningbased approaches. The paper is organized as follows. The description of the systems under study is presented in Section II. In Section III, the details of the proposed approach are presented. The models training procedure and the used performance evaluation metrics are explained in Section IV, along with the evaluation and comparison results. Finally, conclusions and future work are summarized in Section V.

II. SYSTEM DESCRIPTION
The work presented in this paper utilizes the experimental process data from the Secure Water Treatment (SWaT) testbed [29], [30] and the Water Distribution (WADI) testbed [31] developed by iTrust Lab at Singapore University of Technology and Design in order to promote research work in the area of cybersecurity of ICSs. The details of the two testbeds are presented in the following subsections.

A. SECURE WATER TREATMENT (SWAT) TESTBED
The SWaT testbed is a scaled-down water treatment plant that is composed of 6 processes, as demonstrated in FIGURE 1, and is capable of producing 5 gallons per minute of fresh water. The data were collected for a total of 11 days in which 36 different attacks were injected during the last four days by hijacking the packets in the communication links between the SCADA system and the Programmable Logic Controllers (PLCs) comprising around 6% of the total data samples. The network packets were altered to reflect the spoofed values from the sensors [29]. The dataset consists of measurements from a total of 25 sensors for water level, flow rate, pressure, and chemical decomposition, and signals from 26 actuators, such as pumps and valves. The description of the SWaT attack scenarios is provided in TABLE 2.

B. WATER DISTRIBUTION (WADI) TESTBED
The WADI testbed is an operational testbed supplying 10 gallons per minute of filtered water. It represents a scaled-down version of a large water distribution network in a city. It contains three distinct control processes labeled as P1 to P3, as presented in FIGURE 2, each controlled by its own set of PLCs. It consists of a number of large water tanks that supply water to consumer tanks. The dataset captures the testbed operation for 16 days; it consists of a total of 59 sensor measurements and 45 actuator signals. It also includes the control signals of 7 actuators with their setpoints. The dataset contains 15 attacks that were injected during the last 2 days of the testbed operation targeting the components of the cyber-physical system with the intention of interrupting the water supply to the consumer tanks. They were conducted by opening valves and spoofing sensor readings. A description of the WADI attack scenarios is provided in TABLE 3.

III. PROPOSED METHOD
The proposed framework is developed utilizing the normal process data of the actuator signals and sensor measurements, and it is composed of two Isolation Forest (IF) models. The first IF model is developed using the normalized raw data while the second IF model is trained after performing PCA on the normalized continuous-time system variables, as illustrated in FIGURE 3. The aim of the dual isolationforests framework is to exploit the system data by examining it using two representations to extract useful information that improves the process of separating-away/isolating anomalies.  In the following subsections, we provide the details and the theoretical background of the algorithms used in the proposed method.

A. DATA PRE-PROCESSING ALGORITHMS
Machine Learning (ML) is about data analysis using algorithms and statistical models in order to build models capable of predicting outcomes given the input data. Machine Learning-based models are highly dependent on the data used to develop them. The performance and accuracy of the ML models are tied to the quality and representation of the data used. The model's ability to learn and extract useful information for the purpose of the application can be limited if the raw data are complex, redundant, contaminated with noise, etc. Hence, data pre-processing is an essential step in Machine Learning applications to improve learning. It involves data normalization, feature selection/extraction, dimensionality reduction, noise filtering, etc. There are various data pre-processing approaches that are commonly used. The following subsections present the ones used in this work.

1) DATA NORMALIZATION
Data normalization is performed by shifting the data to have a zero mean, and it may include standardization, which is done by scaling the data to have a unit variance. Data normalization There are three processes in the WADI testbed labeled as P1 to P3. P1 is the primary grid process in which the water intake from the SWaT testbed product water or from the return water from P3 in WADI is stored in two storage tanks T-001 and T-002. The storage water tanks in P1 supply water to two elevated reservoir tanks in P2, which is the water distribution process to the six consumer tanks based on the demand. In P3, the recycled water is sent back to P1 once consumer tanks meet their demands. Solid arrows indicate the flow of water and sequence of processes. S and A represent sets of sensors and actuators, respectively. is useful to speed up the learning/training of the model and to optimize the algorithm results since most of the ML algorithms are about solving optimization problems (maximization/minimization), and hence depending on the nature of the data, the learning of the ML-models can be slow and even fall short due to any local optima. Data standardization is usually performed before applying machine learning algorithms that assume that the input data follow a Gaussian Distribution. The analysis is simpler if the input data follow the standard normal distribution of a zero mean and unit variance.

2) PRINCIPAL COMPONENT ANALYSIS (PCA)
PCA is a multivariate statistical analysis method defined as a linear transformation of a set of correlated variables into a new set of uncorrelated variables. It is widely used in data dimensionality reduction. Given a measurement data matrix X ∈ IR m×n where n is the number of variables, and m is the number of observations, a PCA model is developed using the normalized data matrixX ∈ IR m×n by optimizing the correlation matrix C ∈ IR n×n to find a new set of bases that are uncorrelated to represent the data, namely the principal components (PCs). The correlation matrix is calculated as: Then, the eigenvalue decomposition of the correlation matrix is found by: where V ∈ IR n×n is the matrix of the eigenvectors associated with each of the eigenvalues of C, and = diag(λ 1 , λ 2 , . . . , λ n ) is the diagonal matrix of the eigenvalues of C with λ 1 and λ n are the largest and the lowest eigenvalues, respectively. The projection matrix P ∈ IR n×l is used to transform the data onto the new feature subspace. It is composed of the first l eigenvectors of the correlation matrix that are associated with the largest eigenvalues. That is, V = [P,P] where P ∈ IR n×l ,P ∈ IR n×(n−l) , and l is the number of PCs. It is determined based on the desired explained cumulative variance contribution. PCA transforms the data into two subspaces; the principal components subspace (PCS) and the residual subspace (RS). The data transformation of a normalized measurement vector x ∈ IR 1×n to the new data vectorx ∈ IR 1×l in the PCS is expressed as:

B. ISOLATION FOREST-BASED ANOMALY DETECTION APPROACH
Isolation Forest (IF) is an unsupervised Machine Learning algorithm that is used for anomaly detection [33], [34]. It is an ensemble regressor encompassing a number of isolation trees in which each tree is trained on a random subset of the training data, as described in Algorithm 1. The parameters associated with an isolation forest are: 1) The number of trees (n estimators ), 2) The maximum number of observations (m max ) representing the size of the data subset used to train each tree, 3) The maximum number of features (n max ) representing the subset of the data features used to train each tree.

Algorithm 1 Train
Forest (X , n estimators , m max , n max ) Input: X -input data, n estimators -number of trees, m maxsize of data subset, n max -features of data subset Output: a set of n estimators iTrees Initialize Forest for i = 1 to n estimators do X ← sample(X , m max , n max ) Forest ← Forest ∪ iTree X end return Forest As shown in Algorithm 2, the isolation forest uses the concept of isolation to separate-away anomalies in which recursive binary splitting is performed by each isolation tree (iTree) for the random data subset X by randomly selecting a split feature q and its split value p -that is within its rangeyielding a left X l and right X r data subsets each time until all samples are isolated. Each split produces a node, which can be an internal node if there are further possible splitting in the corresponding split regions or an external node meaning it is the last node in the branch when the size of the data subset of that region is 1 or the maximum tree depth is reached. In the case of an internal node, the data subsets of the two branches of the node X l and X r are further split until an external node is reached. The information associated with the external node is the size of the data subset in that region.

Algorithm 2 Train iTree X
Input: X -input data Output: an iTree if X cannot be divided then return externalNode{Size← |X |} else let Q be the list of features in X randomly select a feature q ∈ Q randomly select a split point p between max and min values of feature q in X X l ← filter X , q < p X r ← filter X , q ≥ p return internalNode{Left ← iTree(X l ), Right ← iTree(X r ), FeatureSplit ← q, SplitValue ← p} end Anomalies are different from normal observations, and they can be easily isolated. Hence, it is expected that they will be closer to the root and hence have a shorter path. The anomaly detection for a given data sample x is made upon the score s(x) relative to the detection threshold as follows: where H is the average expected path length of trees in the forest provided that anomalies are labeled as −1 while normal observations are labeled with 1, andh(x) denotes the average path length on all trees defined as: Here, h i (x) is the path length of the ith tree determined by the number of edges in the tree. Then, the anomaly is detected using the following function: For the proposed DIF-based attack detection framework presented in FIGURE 3, the two isolation forest models yield the outputs y 1 and y 2 , which in turn -through a logical operation-produce the attack indicatory. That is, if either of y 1 or y 2 is −1, the attack indicatory is one; otherwise,y is zero. The decision function of the dual isolation forests-based attack detection approach is made by checking an observation window of length w such that an attack is detected if the attack indicatory is 1 for at least 80% of the observation period.

IV. EVALUATION
This section presents the evaluation of the proposed attack detection framework in terms of the used performance metrics, datasets description, models training details, and the evaluation results.

A. PERFORMANCE METRICS
The confusion matrix is a form of contingency table with two dimensions identified as True and Predicted, and a set of classes in both dimensions, as presented in TABLE 4. The following performance metrics are derived from the confusion matrix [35]:

1) PRECISION
It is also called the Positive Predictive Value (PPV), which is a measure of the closeness of the set of predicted results, and it is expressed as: It is also known as the True Positive Rate (TPR) and is calculated by: It is the harmonic average of the precision and recall, where it is at its best at a value of 1, meaning perfect precision and recall. It is given by: . The SWaT and the WADI datasets contain two data logs, the first one contains normal process data only collected for 7 and 14 days, respectively, while the second one consists of data for the system operation under both normal and attack scenarios for 4 and 2 days, respectively, at a sampling time of 1 second. The first step is to clean the second data log by removing the data collected during 1 hour after each attack was terminated because the system behavior in that period is vague and might result in biasing the performance evaluation of the developed models. That is, it represents a recovery period from the attack impact during which the system stabilizes back to its steady-state normal behavior. Considering the actual labeling of this time period, the observations are labeled as normal and attack-free time instants. While behavior-wise, they are anomalous, which in turn would induce false positives and bias the performance evaluation of the proposed approach.
The normal and attack observations in the second log are separated since it was noticed that the normal operational data in the second logs seem to represent a different operational mode -different distribution-. When developing the machine learning model, the distribution of the training and the validation datasets should be the same. The dataset used to train and develop the Machine Learning model should be representative of the system operation. Finally, the steady-state combined normal process data from the two logs are used for developing the proposed detection approach. The attack data subset is used to test the detection model performance.

C. MODEL TRAINING
The training of the isolation forest models is conducted using Scikit-learn library, which is an open-source Machine Learning library for the Python programming language [36]. It is conducted using 5-fold cross-validation such that each IF model is trained 5 times using 80% of the training dataset for training and 20% for validation, selected randomly. Grid search is utilized for model tuning given the limited number of hyper-parameters associated with the isolation forest model for the ranges presented in TABLE 5 and with the objective of achieving a maximum false alarm rate of 5% on the training dataset. The PC used for the training has 64 GB RAM and 12-cores AMD Ryzen 9 3900X CPU with 3.8 GHz speed using 64 bit Windows 10 Pro OS. The two IF models are trained independently using the normalized raw data and the PCA-processed data, respectively. PCA is applied to the continuous-time variables to retain an explained cumulative variance of 95%. For instance, FIGURE 4 and FIGURE 5 present examples of the SWaT dataset visualizations before and after PCA processing. It can be seen that the normal and the attack observations are somewhat fused when viewing the data in the original representation while they are decoupled in the PCA-transformed representation.
The details of the best models of the two isolation forests are presented in TABLE 6. As inferred from [33], the performance of the isolation forest converges in terms of the number of trees n estimators used, and it was found converging at 100 and 250 for the SWaT models, and at 100 and 150 for the WADI models, respectively with minimal further improvements in the detection performance at a higher cost of the training time. FIGURE 6 and FIGURE 7 represent a demonstration example on the SWaT dataset for the effect of varying the number of trees by the Receiver Operating Characteristic (ROC) curves.
In addition, the effect of the number of features n max used to train trees in the isolation forest model was minor while the size of the data subset m max used for training each tree showed noticeable effects on the model's performance since the data subset size determines the average path length as   determined using Equation (5). This is demonstrated using the ROC curves shown in FIGURE 8 to FIGURE 11 on the SWaT dataset as well.
Sometimes the system behavior under some attacks is indistinguishable from the ones during the normal operation. Hence, the benefit of using the dual examination of the raw dataset with the original interdependency between the      in detecting the SWaT attacks and the WADI attacks, respectively, such that the y-axis represents the recall value and the x-axis is the attack index. For the SWaT testbed, it can be seen that some attacks are detected using the IF-2 model that cannot be detected by the IF-1 model such as Attacks 3, 5, 6, 8, and 9 and vice verse. Again, as demonstrated in FIGURE 4 and FIGURE 5, the performance of the IF-2 is better since after performing data dimensionality reduction using PCA, the redundancy and the uncorrelated components in the data are removed, and the data is less fused such that it is easier to isolate away anomalies. Similarly for the WADI dataset, some attacks are detectable by the IF-1 but are not by the IF-2 such that Attacks 3, 4, 5, 10, 11, 12, 13, 14, and 15 are detected by the IF-1 model while Attacks 2, 7, and 9 are detected by the IF-2 model. It is worth noting that the selection of the detection thresholds for the two models is based on a maximum of 5% false alarm rate. That is, the score values in the scenarios that the attacks do not reflect on the system variables are expected to be comparable. When the detection threshold is set, those score values might fall below this threshold, and hence, be considered as attack incidents and vice versa.

D. COMPARISON WITH OTHER APPROACHES
We compared the proposed method with the other applied approaches in the literature that have been developed using the SWaT and WADI datasets. It is worth noting that the followed data pre-processing procedure in most of the previous works presented in this comparison utilized the datasets in a similar way as in our work, i.e., the number and type of   variables used, the use of the normal observations from the two logs for the training and validation while the attack log was used for testing, the use of the steady-state data for the training and validation phase, the consideration of the attack recovery period, etc. The performance evaluation results for  the second log of the two datasets for the different approaches are summarized in TABLE 7 and TABLE 8, considering that the observation window used for the DIF-based detection method is w = 120 seconds that is evaluated every 30 seconds. The evaluation results of additional approaches using PCA, K-Nearest Neighbour (KNN), Feature Bagging (FB), Auto-Encoder (AE), Efficient GAN (EGAN), and SVM that were presented in [12], [17] for comparison are listed as well.
The DIF-based attack detection system achieves an improved F1-score of 88.2% and 65.6% on the SWaT and WADI datasets, respectively. For the SWaT dataset, it was found that the achieved improvement in the F1-score value is up to about 7% for the approach with comparable computational complexity over the the NN-based, SVM, and the TABOR-based approaches. However, it is as minimal as 2.2% for the 1D-CNN-based approach, which is far higher in the computational requirement. In addition, for the WADI dataset when comparing the proposed approach with the GAN-based approach, it was found that the improvement in the F1-score is about 4% while the precision is improved by 23%. However, the recall is less by 18%. There will be a trade-off in terms of the different aspects of the used algorithms, as demonstrated in the former analysis. Moreover, it is assumed that the previous works results, which are summarized in TABLE 7 and  TABLE 8, represent the best performing models as per the authors of the original work.
The total number of detected WADI attacks was 12 out of 15, representing 80% of the attack scenarios, namely the undetected attacks were 1, 6, and 8, as demonstrated in FIGURE 13. The common factor between these attacks is that they were conducted by changing the states of at most two valves from OFF to ON, aiming to overflow tanks or interfere with the water distribution process. It seems that the impact of those attacks on the process is insufficient for the proposed approach to detect them.
In terms of the SWaT attack log, the dual isolationforest-based detection framework was capable of detecting Attacks 3, 20, and 30, unlike the other approaches, as shown in TABLE 9. FIGURE 14 to FIGURE 19 demonstrate the attack indicators for the SWaT attack scenarios with the low recall values reported in TABLE 9, which were detected after a time delay noting that the start of the attack is at the beginning (time = 0 min). As demonstrated in FIGURE 14 and FIGURE 15, the detection delay for Attacks 14, 15, and 28 is less than 1 minute, while, FIGURE 16 and FIGURE 17 show that the time delay in detecting Attacks 12 and 18 is around 2 minutes. In addition, the detection delay for Attacks 13 and 36 is 4 minutes and 10 minutes, respectively, as shown in FIGURE 18 and FIGURE 19. These results indicate that the proposed DIF attack detection approach can eventually       detect Attacks 12,13,14,15,18,28,36, and the low recall values reported in TABLE 9 are mainly due to the detection time delay.
It is worth noting that Attacks 16, 19, 21, and 25 were detected by the NN-based attack detection approach, and additionally, Attacks 1, 2, 25, and 29 were detected by the 1D-CNN-based detection method, but they were not detected by the proposed DIF-based framework. This can be explained by the fact that the scores of those attack (anomalous) points are comparable to others under normal operation, and hence, they cannot be detected without compromising the false alarm rate. In addition, the DIF-based detection method has a relativity higher false alarm rate, which is reflected in the precision metric that was found to be lower when compared with the other approaches. This is due to the fact that the system behavior under some attacks is similar to the ones during the normal operation, and hence, it may be falsely regarded as an attack incident given the threshold setting.

E. CASE STUDIES
This subsection presents two case studies regarding the performance of the proposed approach under the SWaT Security Showdown (S 3 ) attacks on SWaT testbed, and under adversarial attacks.

1) SWaT SECURITY SHOWDOWN (S 3 )
This section presents a qualitative analysis of the performance of our proposed approach on the attack scenarios implemented in the SWaT Security Showdown event, which was held twice in 2016 and 2017 in which independent attack teams were invited to design and lunch real-time attacks on the SWaT testbed. There was a total of 49 different attacks injected targeting the Human Machine Interface (HMI), SCADA, PLCs, historian, sensors, and actuators, and the details of the attacks can be found in [32]. The aim of the event was to enable assessing the effectiveness of detection approaches, namely, the Water Defense (WD) approach, which is a model-based detection method. We were unable to use the S 3 dataset for testing our proposed approach since not all the system variables were recorded/available during the attack injection, but rather only the particular variables of interest for the used detection approach.
As presented in TABLE 10, we qualitatively analyzed the effectiveness of our approach on the S 3 dataset by studying the type of the injected attack in relation to the original attack log provided in the second log of the SWaT dataset. For example, S 3 -2016 Attack 1 aimed to underflow a tank by closing valve M-V101 and stopping the pumps P-101 and P-102. Its effect on the process is similar to SWaT Attack 30 in which pumps P-101 and P-102 were both forced to stop to achieve the same attacker aim. In addition, the goal of S 3 -2017 Attack 20 is to disrupt the operation of pump P-501, which matched SWaT Attack 32 description in which pump P-501 was forced to turn OFF, and the reading of FIT-502 was tampered with in an attempt to deviate the pump operation from the normal condition. It was found that 13 attacks were of the same characteristics as the SWaT attacks and were all detected expect 1 attack, which is S 3 -2016 Attack 14.

2) ANALYSIS OF ADVERSARIAL ATTACKS
Adversarial attacks are crafted attacks by adversaries with the intent of leading the machine learning model to misclassify [37]. Concerns about those types of attacks have raised after the increased deployment of machine learning-based VOLUME 8, 2020 approaches for cybersecurity applications. Isolation forests are less prone to such attacks because of their working principle, which is -as mentioned previously-is based on isolating anomalies. That is, the aggregated predictions of the different isolation trees in the forest while examining the system variables from several aspects -based on the isolation forests specifications, such as n max -promote the isolation forest to be resilient to these kinds of attacks. On the other hand, deeplearning-based approaches such as CNN, RNN, etc., are more prone to adversarial attacks as they aim to extract patterns or features from the input to make the prediction and a designed attack by adding perturbations to the original input can cause the network to misclassify the adversarial input.

V. CONCLUSION
A dual isolation forests-based attack detection system was developed using the system's process data for the Secure Water Treatment and the Water Distribution testbeds, which are down-scale versions of popular industrial control systems. The working principle of the proposed approach is identifying and separating away anomalies from the normal observations using the concept of isolation after analyzing the data in the original and the PCA-transformed representations.
The DIF-based attack detection framework was compared with other approaches in terms of precision, recall, and F1-score. For the SWaT testbed, it was found that the attack detection using the proposed approach was improved by up to 7% in terms of the F1-score value. In addition, a total of 19 SWaT attacks were detected with a minimum recall of 80%, 6 attacks were detected after a time delay of up to 40% of the attack duration, and 11 attacks were undetected. For the WADI testbed, 80% of the attacks were detected, and the performance of the proposed approach was improved in terms of the precision by 23% when compared to the GAN-based approach at the cost of the number of attacks that were detected, which was reflected on the recall value that was decreased by about 18%.
Future work would be as follows: 1) improving the performance of the detection approach by means of feature extraction, 2) and extending the proposed approach to a hybrid detection system using both process and network traffic data of the system to improve the detection capability of stealthy attacks.