Traffic Based Sequential Learning During Botnet Attacks to Identify Compromised IoT Devices

A novel online Compromised Device Identiﬁcation System (CDIS) is presented to identify IoT devices and/or IP addresses that are compromised by a Botnet attack, within a set of sources and destinations that transmit packets. The method uses speciﬁc metrics that are selected for this purpose and which are easily extracted from network trafﬁc, and trains itself online during normal operation with an Auto-Associative Dense Random Neural Network (AADRNN) using trafﬁc metrics measured as trafﬁc arrives. As it operates, the AADRNN is trained with auto-associative learning only using trafﬁc that it estimates as being benign, without prior collection of different attack data. The experimental evaluation on publicly available Mirai Botnet attack data shows that CDIS achieves high performance with Balanced Accuracy of 97%, despite its low on-line training and execution time. Experimental comparisons show that the AADRNN with sequential (online) auto-associative learning, provides the best performance among six different state-of-the-art machine learning models. Thus CDIS can provide crucial effective information to prevent the spread of Botnet attacks in IoT networks having multiple devices and IP addresses.


I. INTRODUCTION
The number of Internet of Things (IoT) devices is increasing rapidly as the application of IoT expand, and [1] reported that 52% of all IoT devices will consist of low-cost and lowmaintenance Massive IoT devices that perform a single task at a time and cannot run complex real-time algorithms to detect and prevent attacks. While systemic approaches to improving the security of cyberphysical systems have been suggested [2], [3], it is difficult (if not totally impossible) to burden simple IoT devices with complex security functionalities [4]. Thus IoT devices are often vulnerable to attackers [5], [6], [7], and common Denial of Service (DoS) accounts for 20% of all attacks against the IoT [8], in which an attacker or malicious device forwards superfluous requests to prevent its normal The associate editor coordinating the review of this manuscript and approving it for publication was Ting Yang . operation by usurping its limited resources with or without malware injection [9], [10].
Distributed DoS (DDoS) attacks can lead to thousands of compromised devices [11] through Botnet attacks where victim devices become compromised and turn into a ''bot'' via malware [12]. For instance, in 2016, a massive DDoS Botnet attack whose source code was later released under the name ''Mirai'', targeted Domain Name System (DNS) provider Dyn [13], rendering Netflix, Reddit, Spotify, and Twitter [14], [15] inaccessible, and gaining malicious access to the servers of leading cybersecurity companies from millions of different IP addresses [16]. The Mirai Botnet sends TCP SYN requests to the IP addresses of large numbers of IoT devices. When the victim device responds to a request, the attacker gains access to it using weak login credentials such as default usernames and passwords pre-installed during manufacture, and can install malware on the victim device, turning it into a compromised device (Bot). The Bot then generates traffic that floods other servers and devices with meaningless requests. Thus the Botnet compromises new devices and propagates over the IoT network, as illustrated in Figure 1.
In addition to network-wide effects, Botnets significantly increase network congestion and power consumption, and processor and memory usage at the device level, hence posing challenges for resource-constrained devices [17]. Thus, given the nefarious impact of Botnets at both network and device level in Massive IoT networks, it is crucial to identify malicious packets and compromised IoT devices in real-time during an attack, so as to prevent the attack from spreading.
The remainder of the paper is organized as follows: • Section II describes prior work and the novelty of this work as compared to the state-of-the-art.
• Section III describes the data used for validating and illustrating the results and introduces some related notation.
• Section IV presents the architectural design of CDIS, and the methodology that it uses, including the choice of traffic statistics.
• Section V compares the performance of CDIS among six different ML models, while also analyzing the effectiveness of the proposed network traffic metrics.
• Finally, in Section VI the main outcomes of this work are summarized, and some further directions for research are indicated.

II. RELATIONSHIP TO THE STATE-OF-THE-ART
We now review the relationship between this work and the state-of-the-art regarding the detection of Botnet traffic and of IoT devices that were compromised during a Botnet attack. In work that addresses the detection of Botnet attacks, Antonakakis et al. have analyzed the characteristics of Botnet attacks [18], and in [13], Margolis [20].
Recent research has used the following ML models to detect Botnet attacks: KNN, Support Vector Machine (SVM), Decision Trees (DT) and MLP in [21]; Classification and Regression Trees (CART) [22]; DT, Gradient Boosting and Random Forests [23]; and Logistic Regression [24]. Tuan et al. [25] conducted a comparative study of the performance of classification models and Neural Networks (NN). Neural networks were also used to detect Mirai Botnets in Software Defined Networks (SDN) by Letter et al. [26]. In 2020, Sriram et al. [27] used MLP with a deep architecture, while Soe et al. used NN and Naive Bayesian Models (NB) with a sequential architecture [28]. In addition McDermott et al. [29], developped a bidirectional LSTM-based text recognition model for packet-level detection. Another deep learning model, the Convolutional Neural Network (CNN), was used with feature transformation by Liu et al. [30] and combined with LSTM by Parra et al. [31]. Tzagkarakis et al. [32] detected Botnet attacks via a sparse representation framework with a large number of 115 inputs for which only normal traffic is used to select parameters, while recent work [33] developed a Mirai Botnet attack detector using the AADRNN with auto-associative learning.
Whereas the work reviewed in this paragraph aims at detecting Botnet attack traffic, the goal of the present paper is to identify compromised IoT devices or IP addresses that have received Botnet traffic or actually become ''Bots'' during a Botnet attack. The paper does not discuss the actions that a network may take, such as rerouting or blocking traffic [34], [35] after an attack is detected.
Other work has focused on detecting compromised IoT devices during Botnet attacks. Kumar and Lim. [36] have developed an optimization-based technique to detect Mirai-like bots by scanning the destination port numbers in packet headers; they analyze the subset of IoT packets to minimize the time it takes to detect compromised devices. Chatterjee et al. [37] develop an evidence theory based traffic flow analysis in IoT networks in order to detect malicious devices selecting the rarest set of traffic features, where the full set of features includes the transport layer protocol, number of reconnections and source/destination ports etc. In [38], Nguyen et al. focus on an anomaly detection technique for compromised devices using a combination of federated learning and language analysis for individual device types identified prior to anomaly detection. On the other hand, Abhishek et al. [39] detect compromised gateways rather than devices, monitoring the downlink channels in an IoT network. In the case of mobile devices, Taneja [40] detects compromised devices taking into account their location, so that a location change or unusual current location may hels to identify a device that is compromised.
In this paper, we develop a novel lightweight system called CDIS, so as to identify devices which have become Bots, which significantly differs from some past approaches aimed at detecting compromised IoT devices because: • CDIS is only trained online, and only with normal traffic, so that the difficult collection of extensive attack data is no longer necessary, and biases that may be caused by the simulation of attacks are avoided, and • CDIS only uses high-level packet information i.e. transmission times and packet lengths to calculate traffic VOLUME 10, 2022 statistics. Therefore, it only needs access to the headers of traffic packets, and it is computationally light.

A. CONTRIBUTIONS OF THIS PAPER
This paper develops a novel Compromised Device Identification System (CDIS) to detect compromised devices or IP addresses in a network during an ongoing Botnet attack: • CDIS learns sequentially from ongoing normal traffic, using only the packet streams that it recognizes as being benign, and uses an original choice of statistics regarding received and transmitted traffic, calculated only from packet lengths and transmission times.
• It uses a Machine Learning (ML) algorithm based on the Auto-Associative Dense Random Neural Network (AADRNN) [41] with online auto-associative learning that was initially designed for image recognition [42], and also used successfully to detect SYN attacks against IoT devices [43]. The high performance of the clasical Random Neural Network [44] with offline gradient-descent learning [45] to detect SYN attacks was shown earlier in [46], while the AADRNN's excellent performance with off-line training to detect MIRAI attacks was recently demonstrated in [33].
We evaluate the performance of CDIS on the publicly available Kitsune Mirai Botnet attack dataset [47], [48] as well as the MedBIoT and Bot-IoT datasets. We also compare the performance of AADRNN with the following state-of-the-art Machine Learning (ML) models: Linear Regression (LR), Least Absolute Shrinkage and Selector Operator (Lasso), K-Nearest Neighbors Regressor (KNN), Multi-Layer Perceptron (MLP) and Long-Short-Term Memory (LSTM). Our results show that CDIS under AADRNN significantly outperforms all these other ML models. It successfully identifies compromised IP addresses by achieving high Sensitivity (98.7%) and Specificity (94.9%). In addition, the computation time of CDIS is shown to be highly acceptable for practical applications.
CDIS has the following contributions and advantages: 1) It is trained with the normal ongoing traffic which is collected during real-time operation. CDIS learns online and sequentially, in auto-associative mode, by only using the traffic that it identifies as being benign. Thus It does not require the collection of either prior attack or normal non-compromised traffic.
2) It identifies compromised IoT devices based on traffic statistics calculated using only high-level packet information such as packet lengths and transmission times. Therefore, it only processes packet headers with very low computational requirements. 3) It achieves high identification performance with low computational requirements, which is vital to quickly prevent the spread of Botnet attacks in large networks.

III. DATASET, GROUND TRUTH AND METRICS
We use data from the Mirai Botnet from the publicly available Kitsune dataset [47], [48] which contains 764, 137 packets cover a consecutive time period of roughly 7137 seconds (nearly 2 hours) with 107 distinct IP addresses that either sent or receive traffic and for each of which we will perform infection detection. We will denote by S the set of all sources nodes or devices, while D will represent the set of all destination node or devices in the dataset. The dataset contains the ground truth regarding whether a packet is an attack packet or a normal non-attack packet. Thus, for a packet p in the dataset, a(p) denotes the binary attack label for packet p in the dataset, with a(p) = 1 denoting an attack, and a(p) = 0 denoting a non-attack normal packet. Each packet also contains the date t at which it is sent, and the complete representation of packet p is: We separate the packets into time windows of fixed duration T , so that the collection of packets that are sent by device s to any other device d in the network in the k − th time window is: and the set of all packets arriving to d in the k−th slot is: The ground truth for the infection level is defined as the ratio of the number of attack packets to the total number of packets that arrive at device or node d in time window k: and the binary estimate of the ground truth is then obtained where 0 < < 1 is a threshold on the infection level to compute the binary ground truth estimate. Note that should be selected considering the desired sensitivity of the network regarding malicious packet transmission. v i k is the variable that we use to test how well our attack detection schemes are working. In this paper it is not used at all for learning since we develop an online learning technique which does not rely on prior offline learning.

A. DEFINING THE TRAFFIC METRICS
Since the aim of the NTSC module in Figure 2 is to identify instances of the traffic that may contain infection regarding a device i, it is important to judiciously select the metrics to be extracted.
The traffic metrics in this paper are chosen to address Mirai Botnet attacks, and may not be useful for identifying compromised devices for other types of attacks, so that a CDIS for other attacks may require other metrics. Since Mirai attacks spread over the network by infecting IoT devices, when a device is compromised via malware, it will generate more packets with a larger amount of total traffic, so as to spread the attack over more nodes and overload the network. Previous work [33] had used and validated the following three network traffic statistics which are related to malicious packets in a Mirai attack: • Traffic Statistics 1: The total size of the last P transmitted packets, • Traffic Statistics 2: The average inter-transmission times of the packets over the last P packets, • Traffic Statistics 3: Total number of packets that are transmitted in a time window with a duration of T .
Experimental results in [33] have shown that malicious packets are successfully detected using these three statistics during the Mirai attack. However, since in this paper we wish to identify compromised devices (not just malicious packets), we develop a new set of metrics, or statistics, for both the traffic received and transmitted by each IoT device i, inspired from these three previous statistics. Indeed, in order to identify the sources of attacks, and the effect of the attacks, it is important to analyze the traffic received from each source individually, rather than the overall aggregated traffic received from all sources, since observing traffic from a compromised device can be an effective means of detecting the existence of an infection. Thus, we have selected some statistics to summarize the traffic sent by or received from each individual source.
Let |pk(t, s, d)| denote the length of the packet in bytes. Packets sent by the set of nodes S have a maximum and minimum length L M S and L m S rin bytes, respectively, where the minimum may corresponds to a packet with just the header included and an empty data field. Each node s also has a maximum outgoing rate of θ s in bytes/second. The normalized statistics (or metrics) that are used for the traffic received or sent by node i within window k are as follows, where each normalized statistic takes a value between 0 and 1: • Received Traffic Statistics (RTS)1: The normalized average size of packets received by device i from all the sources in time window k: • RTS2: The normalized maximum size of any packet received at node i from any of the sources in time window k: The use of L M S in RTS1 and RTS2 offers a normalization with respect to the maximum packet length. Note that large packets do not always suggest attacks: indeed, SYN attack packets may be quite short [43]. On the other hand, Denial of Service attacks that aim at creating congestion on links would have to be rather long.
• RTS3: The average number of packets received from all sources that have sent packets to i in time window k: Note that the denominator term in the above expression can be computed iteratively in a very efficient manner, so that x i,3 k is obtained directly from the terms in x i,3 k−1 . • RTS4: The normalized maximum number of packets received from any single source in time window k: VOLUME 10, 2022 We now define the other traffic statistics that are important for detecting whether IoT device i is infected, and basically measure the total traffic in terms of both size and packet transmission rate from i to other nodes: • Transmitted Traffic Statistics (TTS) 1: The normalized total amount of traffic transmitted by device i in time window k: • TTS2: The normalized total number of packets that are transmitted by i in time window k: where the use of L m i in TTS2 is due to the fact that the maximum number of packets that may be transmitted by

IV. THE COMPROMISED DEVICE IDENTIFICATION SYSTEM (CDIS)
In this section, we present our Compromised Device Identification System CDIS based on the AADRNN whose architecture for IoT device i is shown in Figure 2. The AADRNN [41] is an extension of the Random Neural Network Model [44], which (in addition to excitatory and inhibitory spikes), incorporates soma-to-soma triggering [49], which generalizes the RNN and retains its ''product form'' solution. The power of this model was recently confirmed in extensive tests with conventional supervised off-line learning for a wide range of network cyberattacks [50], based on the proven aproximation capability of these models [51].
In our approach, a distinct instance of the CDIS is installed on each device i to determine if that device is compromised. The inputs of the CDIS are extracted from the received and transmitted traffic flows for the device (or IP Address) i, and the output is a binary infection decision z i k for device i. CDIS is composed of the Network Traffic Statistics Calculator (NTSC), AADRNN, and Infection Classifier (IC) modules. The Traffic Statistics Calculator is already detailed in Section III-A so that we now focus on the AADRNN and IC Modules. Note that Table 1 summarizes the symbols in order of appearance in this paper.
In CDIS of Figure 2, the NTSC module extracts the values x i,j k of the distinct metrics 1 ≤ j ≤ J from the packets received or sent in time window k by device i. We will define the vector of metrics for node i at window k: and the ordered sequence of vector of metrics collected from window 1 up to and including window k: We now detail the weights of the AADRNN used by CDIS in Section VI and use W i,k to denote the generic form of the whole matrix of weights for device i after input x i k has been used for learning Thus W i,k is composed of the weight matrices connecting the clusters of neurons in layer l to layer l + 1 : W i,k 0 denotes the weight matrix connecting the inputs (from the traffic metrics) to the first layer, while W i,k L connects the last L − th layer to the outputs [y 1,L+1 , . . . , y J ,L+1 ]. Now let ζ (.) be the J −vector activation function for the AADRNN defined in Section VI. We can iteratively define the outputs of the AADRNN's l − th layer, 1ł ≤ L, for the overall weight matrix W i,k obtained after x i k , the k − th J −vector input, has been processed. Let: and the whole AADRNN s output is : and finally : y L+1 = y L . W i L .
Thus each y j,L+1 , 1 ≤ j ≤ J is a function: of the corresponding input x i k and of the overall weight matrix W i,k after each time window k.

A. THE INFECTION CLASSIFIER (IC)
Using the previously defined quantities, we now compute an output error that is needed to classify the inputs of the successive windows at step k as being ''normal'' or of attack nature. This is done by computing the maximum of all the differences between the elements of the input vector x i k and the elements of the output vector: We then use a specific threshold value 0 < γ i < 1 for each of the devices or nodes, so as to provide a binary decision of the form: Since we are carrying out online learning, without prior off-line training using the ground truth, the outputs z i k not only provide decisions, but they also allow us to operate the on-line auto-associative algorithm given below.

B. ONLINE LEARNING
Since we use on-line learning, the W i,k l matrices may be updated after each successive input x i k , so that the sequence of weight matrices are updated after each subsequent input, as follows. However, we can only train with ''normal'' (nonattack) values of x i k so that: • If z i k = 1, i.e. x i k is estimated to contain an attack, then do not update the weights, i.e. where W i,k−1 l is the value, prior to the processing of input x i k , of the weight matrix that connects the l − th layer to the inhibitory inputs at layer l + 1.
using the optimization function detailed in Section IV-C. which is specialized to layer l as F l .

C. OPTIMIZATION ALGORITHM
The ''online auto-associative learning'' for the AADRNN uses the fast training algorithm from previous work [33], [43]. It adapts to the naturally time-varying characteristics of network traffic, and the CDIS will update its parameters automatically as a function of the traffic it encounters as it operates. Here W i,k−1 l is l − th layer output connection matrix, just before the CDIS processes input x i k with the semi-supervised algorithm of [42]. We then use the following approach only if z i k = 0: Computation: For each layer l ∈ {0, . . . , L}, compute W i,k l using the Fast Iterative Shrinkage-Thresholding Algorithm (FISTA) [52]: where the J × J weight matrix W R is randomly generated with elements in the range [0, 1]. On the other hand, adj(B) is the linear mapping of the elements of matrix B into the range [0, 1] then applies the z-score (standard score), and adds a positive constant to remove negativity. FISTA: After FISTA [52] is performed for 700 iterations, we finally normalize the resulting weight matrix W i,k l :

D. PARAMETER SETTINGS FOR CDIS
We first set T = 10 seconds to have a significant number of time windows (approximately 712) each of which contains significant number of packets. Then, for normalization, In order to be able to present clear and detailed analysis, in the first part of this section, we consider only the IP addresses in ''192.168'' network within the experimental setup of the used Mirai Botnet dataset [47]. . Note that, during our experiments, we do not consider any information about the characteristics of the devices to which the IP addresses belong; that is, we treat all IP addresses as equivalent (or as individual IoT devices). Figure 3 displays the ground truth values of the level of infection (φ i k ) for these IP addresses (shown with local indexes) over 712 windows. One may see that the infection level gets significantly high for the 4th, 7th, 10th, 16th, 19th, 20th and 22nd IP addresses while the infection level remains around 0 for only the 1st, 8th, 15th, 17th, 18th, 23rd and 24th IP Addresses.
Based on the data in Figure 3, we can consider that an IP address is compromised if infection level φ i k is at least 0.5 (i.e. at least 50% of transmitted packets are malicious).
Let us note that the 50% threshold may be too high in the case of particularly dangerous or stealthy attacks, where the presence of 10% or 20% of attacking packets may result in a significant compromise of the system. However, we set = 0.5 to have a represenative case, and calculate the binary ground truth of compromised devices via using (5). The resulting ground truth data contains 1494 positive (compromised) samples and 15594 negative samples in total over all of these 24 IP addresses over all of the 712 windows. In addition, using the binary ground truth for = 0.5, we observe that the 4th, 7th, 10th, 16th, 19th, 20th, and 22nd IP addresses become compromised over time.

E. HYPERPARAMETER SETTINGS FOR ML MODELS
In order to compare the performance of CDIS under AADRNN with other ML models, we replace the AADRNN in our architecture in Figure 2 with each of the LR, Lasso, KNN, MLP, and LSTM models. At each time window k, all these models are sequentially trained with the same input X i k and output Y i k matrices as for AADRNN in Section IV-C. Thus, in the remainder of this subsection presents the parameter selections for each of these models as well as AADRNN.

1) AADRNN
The number of hidden layers of the AADRNN is L = 3; then, n = LJ for total number of statistics J = 6, and p = 1/n. We also set r = 1 and j,l = λ j,l = 0.005.

2) LR AND LASSO
We use two different linear ML models, LR and Lasso, in place of the AADRNN, and selected the most simple LR technique to create a baseline performance. Also, Lasso is used to observe the effects of feature selection on the cumulative performance since it is a linear model which shrinks irrelevant statistics values to zero, and we search for the best value of the L1 term multiplier between 0.1 and 1 in increments of 0.1, setting the value of this multiplier to 0.2. Moreover, we implement both LR and Lasso using the scikit-learn library [53] on Python.

3) K-NEAREST NEIGHBOURS REGRESSOR (KNN)
Early research [21] showed that the KNN achieves highly competitive results for detection of Botnet attacks, thus the KNN is one of the methods that we have implemented in scikit-learn and compared against the AADRNN. In each window k tthe number of neighbors in KNN is set to min(k − 1, J ), since the number of samples used for training equals k − 1 and KNN requires at least as many neighbors as the number of samples.

4) MULTI-LAYER PERCEPTRON (MLP)
A feed-forward MLP with two hidden layers with J = 6 neurons, followed by an output layer was used, with a sigmoidal activation function and J = 6 neurons in the output layer. Both training and execution was performed using Keras on Python, where the MLP is trained via the Adam optimizer for 50 epochs at each time window.

5) LONG-SHORT TERM MEMORY (LSTM)
Lastly, we use LSTM with a single LSTM layer, J = 6 units, and three fully connected layers including the output layer which is comprised of J = 6 neurons, and sigmoidal activation througout. Training and execution of is also performed using Keras on Python, and trained via Adam optimizer for 100 epochs at each time window k.

F. PERFORMANCE EVALUATION METRICS
The dataset being used is unbalanced due to the nature of the problem. Therefore our main metric of ''Balanced Accuracy'' [54], has been revised to handle IP addresses that are never compromised as well as those that may be compromised. To this end, for each IP address i, we sum the number of True Positive, True Negative, False Positive, and False Negative cases in each window for each i to obtain: Then, for each IP address i, the Revised Balanced Accuracy BA i is: Thus BA i is ''Balanced Accuracy'' if the IP address i is compromised in any time window k, while it is simply ''Accuracy'' if the IP address i is never compromised. In addition to the Balanced Accuracy, we also use other well-known metrics: Sensitivity (True Positive Rate), Specificity (True Negative Rate), geometric mean of Sensitivity and Specificity (G-Mean), and Matthews Correlation Coefficient (MCC), which are displayed as percentages. Note that Sensitivity, G-Mean, and MCC metrics can only be presented for IP addresses which are compromised in at least one time window. Since some IP addresses are never compromised, as seen in Figure 3, these three metrics cannot be presented for uncompromised addresses.

V. PERFORMANCE EVALUATION RESULTS
For each IP address i in the considered network, in order to achieve the highest performance of the CDIS, we first analyze and select the best value of γ i . To this end, the Balanced Accuracy performance of CDIS of IP address i is measured for all γ i values increasing in 0.002 intervals from 0 to 1. The measured performance displayed in Figure 4, where the performance range is shown with colors which range from red to green, also reveals that the Balanced Accuracy performance of CDIS (using AADRNN) under the best value of γ i is very close to 100% (shown with dark green) for all IP addresses except for the 16th, 19th, 20th, and 22nd IP addresses, whose performances are around 62%, 71%, 72% and 56% respectively under the best value of γ i . Furthermore, Figure 4 shows that the Balanced Accuracy performance of CDIS is acceptably high for various γ i values around the best value; hence, one may say that the performance of CDIS is highly robust with respect to the choice of γ i . On the other hand, the best value of γ i is considerably different for each IP address i, and is best for successive IP addresses as follows: 0.102, 0.888, 0.998, 0.002, 0.114,  0.102, 0.002, 0.186, 0.156, 0.212, 0.102, 0.132, 0.282, 0.208,  0.102, 0.002, 0.102, 0.102, 0.002, 0.002, 0.122, 0.098, 0.102, and 0.102. Note that if several values of γ i achieve the best performance, the smallest value is selected. VOLUME 10, 2022  Using the best values of γ i 's, we evaluate the performance of CDIS under AADRNN with respect to Balanced Accuracy, Sensitivity, and Specificity. Figure 5 displays the box plot of this performance evaluation over the considered IP addresses. Recall that Sensitivity is only presented for IP addresses that are compromised in at least one window.
The Balanced Accuracy, Sensitivity and Specificity show that the median performance of CDIS is almost 100%. However there are four IP addresses for which the measured performances are the outliers. The Balanced Accuracy for the outliers are 72%, 71%, 62%, and 56%.While searching for the best value of γ i , we observed that these are the 16th, 19th, 20th, and 22nd IP addresses in Figure 4.
These results also reveal that CDIS is able to successfully detect infection for all IP addresses (minimum Sensitivity is 85%) but suffers from low Specificity (i.e. high false alarm rate) for outlier IP addresses. Indeed, we observe that the reason for the low specificity of the IP addresses with the outlier CDIS performance is that their traffic statistics do not indicate infection, and two of these IP addresses (19th and 20th) do not receive traffic but only transmit, so that the indicators RTS 1-4 are zero for all time windows.

A. PERFORMANCE OF CDIS UNDER DIFFERENT ML MODELS
The performance of CDIS under AADRNN is compared with that under each of LR, Lasso, KNN, MLP, and LSTM, where the best value of decision threshold is selected via exhaustive search for each ML model. The comparison of the performances with respect to Balanced Accuracy, Sensitivity, and Specificity are presented in Table 2. The numerical results in this table are presented as the average of each measurement over the IP addresses considered. For example, the Balanced Accuracy is first calculated for each IP address; then, the average of Balanced Accuracy is computed over all IP addresses.
The results in Table 2 show that CDIS is able to achieve highly acceptable performances under various ML models although some models lack the balance between Sensitivity and Specificity. On the other hand, the best Balanced Accuracy performance of CDIS is observed under AADRNN, which achieves the most balanced performance between Sensitivity and Specificity. It also appears that linear models (LR, Lasso and KNN) achieve high Specificity but LR and KNN have significantly low Sensitivity. In addition, the Sensitivity of linear models and the Specificity of MLP and LSTM are significantly low. That is, majority of linear models cannot properly detect compromised IP addresses, while MLP and LSTM cause a high rate of false positive alarms. Furthermore, Table 3 displays the average and standard deviation, in milliseconds, of each of the training and execution times over all IP addresses and all time windows for each ML model. Training and execution times are measured in Python using the CPU of a PC with 32 GB of ram and AMD Ryzen 7 3.70 GHz processor. Also, recall (from Section IV-E) that each of the neural network models considered (i.e. AADRNN, MLP, and LSTM) has three hidden layers with six neurons each. In addition to its three hidden layers, LSTM neural network has also an LSTM layer with six units.
The results in Table 3 show that: 1) The mean training time of AADRNN is lower than Lasso, MLP and LSTM, with very low standard deviation of 15 ms. However, all of the AADRNN, Lasso, MLP and LSTM models require significantly more training time than LR and KNN.   In this figure, the logarithmic prediction error log 10 (|v i k − i k |), 1 ≤ k ≤ 712 of CDIS with AADRN, is plotted versus the time slot k for three IP addresses: the 1st, 10th, and 101st. Only three IP addresses were chosen to clearly visualize the prediction errors that may be affected by online training.
In the top of Figure 6 we see that CDIS under AADRNN provides an average 88% Balanced Accuracy for this complex network structure with 107 unique IP addresses, while both average Sensitivity is 90% and Specificity is 79%.
More detailed results in Figure 6(bottom) reveal that the Balanced Accuracy performance of CDIS is above 92% for 2/3 of IP addresses and above 50% for all IP addresses. That is, the Balanced Accuracy performance of CDIS is between 50% and 92% for only 36% of IPs. It is also seen that both median Sensitivity and median Specificity are 100%; however, the number of node with lower Specificity is high compared to Sensitivity. On these results, we also observed that Sensitivity is above 90% for 85% of the IP addresses that are compromised at least in one time window, while Specificity is above 90% for 67% of all IP addresses. On the other hand, for only 4 IPs, the Sensitivity is below 40%. Furthermore, in Figure 7, we plot the logarithmic prediction error of CDIS with AADRNN, defined as log 10 (|v i k − i k |), 1 ≤ k ≤ 712, versus the slot k for three IP addresses: the 1st, 10th, and 101st. Our purpose is to present the prediction errors of the online training clearly. Indeed, the results on the 10th and 101st IP addresses show that CDIS achieves lower prediction errors for normal non-attack traffic after k = 100. On the other hand, the online training does not appear to reduce the accuracy for the 1st IP address, which may be because this address was never compromised as shown by the ground truth in Figure 3.

C. CDIS WITH DIFFERENT DATASETS
Although this paper mainly focuses on identifying the compromised IoT devices during a Mirai Botnet attack, the proposed CDIS architecture can also be used for different types of DDoS or DoS attacks, in which the malware spreads over the devices. However, the proposed network statistics may or may not be effective while implementing our CDIS for DDoS attacks other than Mirai. Accordingly, achieving a high performance for various types of DDoS attacks may require to define and use a much larger set of statistics. In this section, we now evaluate the performance of CDIS for various types of DDoS attacks provided in three different datasets: Kitsune [47], [48], MedBIoT [55], and Bot-IoT [56].
In addition to the Mirai Botnet data which is used during the performance evaluation in Section V, we now use SYN DoS data from the Kitsune dataset, which contains 2, 771, 276 packets transmitted in about 53 minutes. Next, we evaluate the performance of CDIS on Mirai attack in the MedBIoT dataset. For this dataset, we merged the files of attack and normal traffic preserving the actual time stamps resulted in 5, 727, 929 packets transmitted in about 30 minutes. We also present our results for DDoS attacks using HTTP, TCP and UDP protocols as well as a DoS attack using HTTP protocol from the Bot-IoT dataset. In Bot-IoT, there are 19, 826 packets transmitted in 42 minutes for the DDoS HTTP, 19, 548, 235 packets in 40 minutes for DDoS TCP, 18, 965, 736 packets in 47 minutes for DDoS UDP, and 29, 762 packets in 49 minutes for DoS HTTP. During the performance evaluation for the datasets except DDoS TCP and DDoS UDP, we use the same data processing and parameter settings which are described in Sections III, IV-D and IV-E. For DDoS TCP and DDoS UDP, we set τ = 0.01 and M = 190. Figure 8 displays the balanced accuracy results of the performance evaluation of CDIS for the Kitsune, MedBIoT and Bot-IoT datasets. First of all, these results show that the proposed CDIS is able to very successfully identify compromised devices during Mirai Botnet attacks, with a median Balanced Accuracy of 100% for both Kitsune and MedBIoT. Recall that the Balanced Accuracy is above 92% for 2/3 of unique IP addresses. For MedBIoT dataset, the Balanced Accuracy performance is above 85% for 90% of all IP addresses.
The results in Figure 8 also show that CDIS can achieve high performance for DDoS and DoS attacks. On the other hand, the results for DDoS and DoS attacks, especially those use TCP and UDP protocols, are significantly lower than those for Mirai attacks. The inferior performance is mainly because the traffic statistics are defined considering the Mirai Botnet attacks. For each of the DDoS HTTP and DoS HTTP, the median Balanced Accuracy performance is above 100%. We may also see that the performance of CDIS slightly lower for DDoS attacks on the network traffic using TCP or UDP protocols. Since CDIS parameters have already been adapted for DDoS TCP and DDoS UDP datasets, our results show that communication protocols are effective on the identification performance and can be evaluated to determine more specific statistics.

D. FURTHER REMARKS ON THE RESULTS
We first evaluated the performance of CDIS on Kitsune Mirai attack dataset for two different network setups with 24 and 107 unique IP addresses. We also presented the performance evaluation for 7 attack data in three different datasets, namely Kitsune, MedBIoT and Bot-IoT. The experimental results show that: • The CDIS achieves high performance (94% Balanced Accuracy) with low computation time for both execution and online training.
• The CDIS under AADRNN outperforms the other models (LR, Lasso, KNN, MLP, LSTM) by a significant margin, while its computation time is very competitive with that of the fastest (simplest) models.
• The CDIS can be used not only for Mirai, but also for various types of DDoS attacks where malware spreads over IoT devices. However, some attack types and/or communication protocols may require customization of traffic statistics and parameter settings of CDIS.

VI. CONCLUSION
In order to identify compromised devices and/or IP addresses in an IoT network as ongoing traffic flows through the system, we have developed a novel ''Compromised Device Identification System (CDIS)'', which analyzes the received and transmitted traffic by each individual device. Based on inter-packet transmission times and packet lengths that are measured from the traffic flow and taken in successive time windows, it determines whether the device is compromised using ML. The AADRNN is used as the ML tool, and is trained via online auto-associative learning using only normal traffic that is available during the system's normal real-time operation. Thus, it is important to state that CDIS does not require prior offline collection of any attack or normal traffic.
The performance of CDIS is tested on a publicly available Kitsune Mirai Botnet attack dataset for two different network setups, as well as on the MedBIoT and Bot-IoT datasets. As the experimental results suggest, the proposed CDIS provides highly accurate results, and it may pave the way to prevent Botnet attacks from spreading over the devices in an IoT network by blocking, or dropping, the outflow of traffic from IoT devices or IP addresses that have been identified as being compromised and turned into Bots.
Thus future work could also study the dynamics related to the CDIS system, from the instant when the attack begins, to the instant of detection, and finally to the time when the Botnet outflows can be blocked or dropped, so as to determine the overall time delay needed to actually stop the Botnet from spreading. Such studies can also examine the possibility of identifying the sources of the Botnet and carrying out preventive blocking/dropping of traffic at the sources themselves as was suggested in some earlier work [57]. Another important issue that is worth considering, is the energy consumption consequences of the AD itself [58], [59] as well as of the mitigation actions that are taken.
The proposed CDIS does not consider the known direct relationships between devices, such as their connectivity patterns. Thus in future work, it will be interesting to investigate the effects of each compromised device on other devices in an IoT network not just from the arriving packets at each node but also from any available known information concerning the connectivity between IP addresses, network nodes and devices.

APPENDIX: DENSE RANDOM NEURAL NETWORK WITH ONLINE AUTO-ASSOCIATIVE LEARNING (AADRNN)
The AADRNN structure introduced in [42], which was adapted to the design of CDIS, has L hidden layers, with J clusters in each hidden layer as shown in Figure 2. Each cluster consists of n statistically identical, probabilistically interconnected cells. As shown in Figure 2, the input to the AADRNN is the collection of network statistics, x i k , calculated by the NTSC module for each device or port i for each successive time window k − 1, and the corresponding AADRNN output is denoted by y L+1 (x i k , W i,k ), for current window k.
Since cells in a given cluster are identical, for the time being we denote the state of each cell in layer l and cluster j by q j,l , with total firing rate denoted by r j,l , and it receives external excitatory arrivals of spikes at rate j,l . Each cell j also receives inhibitory inputs from a corresponding cell in each cluster j belonging to the previous layer l − 1, with inhibitory weight w − j ,l−1,j , j ∈ {1, . . . J }. Thus for any cell in cluster (j, l) at layer l ∈ {2, . . . , L}, the cell's total external inhibitory input is: where λ − j,l is an additional external inhibitory spiking rate into cell or neuron (j, l), and the sum of the (j, l)-neuron's outgoing inhibitory weights is: w − [j, l, j ], hence r j,l = w j,l + r, r ≥ 0, (25) where r is the firing rate of the soma-to-soma interactions that provoke the joint firing of any cell in the same cluster with probability p n where n is the number of cells in each cluster. Thus, we have: which reduces to: q j,l = j,l + rq j,l (n−1)(1−p) n−q j,l p(n−1) r j,l + λ j,l + rq j,l p(n−1) n−q j,l p(n−1) . VOLUME 10, 2022 We are interested in solutions of the above expression which are probabilities, and first seek the maximum value q j,l = 1, which is attained for the maximum value of j,l . Thus we can write: j,l ≤ r j,l + λ j,l − r(n − 1)(1 − 2p) n − p(n − 1) , ≤ r j,l + λ j,l − r(n − 1) n , ≤ w j,l + λ j,l + r n .
We notice that th expression for q j,l is a second degree equation in q j,l and for large n it becomes: q 2 j,l pλ j,l − q j,l [λ j,l + p( j,l + r)] + j,l = 0, whose solution is the activation function ζ (.) for each cell in each cluster of the AADRNN: ζ j,l = max 1, λ j,l + p( j,l + r) 2pλ j,l ± [λ j,l + p( j,l + r)] 2 − 4pλ j,l j,l 2pλ j,l .