IoT Network Cybersecurity Assessment With the Associated Random Neural Network

This paper proposes a method to assess the security of an <inline-formula> <tex-math notation="LaTeX">$n$ </tex-math></inline-formula> device, or IP address, IoT network by simultaneously identifying all the compromised IoT devices and IP addresses. It uses a specific Random Neural Network (RNN) architecture composed of two mutually interconnected sub-networks that complement each other in a recurrent structure, called the Associated RNN (ARNN). For each of the <inline-formula> <tex-math notation="LaTeX">$n$ </tex-math></inline-formula> devices or IP addresses in the IoT network, two distinct neurons of the ARNN advocate opposite views: compromised or not compromised. The fully interconnected <inline-formula> <tex-math notation="LaTeX">$2n$ </tex-math></inline-formula> neuron ARNN structure of paired neurons learns offline from ground truth data. Thus rather than requiring a separate attack detector at each network node, the ARNN offers a single overall attack detector that observes the incoming traffic at each node, learns about the interdependencies between network nodes, and formulates a recommendation for each device or IP address in an IoT network. The ARNN weight initialization and learning algorithm are discussed, and the ARNN performance is evaluated using real attack data, and compared against several learning and testing techniques. Results are obtained both for off-line learning with ground truth data, and for on-line incremental learning using a simplified average metric measured from incoming packet traffic. Comparisons with the best state-of-the-art techniques show that the ARNN significantly outperforms previously known approaches.

, the MLP and the Convolutional Neural Network (CNN) were employed with learning on the focal loss to detect IoT intrusions, while in [22] Naive Bayes was combined with an evolutionary feature selection method to develop a signature-based system to detect Botnet, DDoS, and port scan attacks. In [23], a Botnet attack detection system that classifies the network traffic using beta mixture model based on a set of statistically extracted traffic features is discussed.
Recent research on cyberattack detection has also used self-supervised learning systems. For example, in [24], a selfsupervised learning algorithm combining LSTM with CNN was developed for anomaly-based attack detection for networks inside vehicles. The bidirectional Generative Adversarial Network (GAN) was used for anomaly detection [25], and a Graph Neural Network (GNN) network-level IDS was developed in [26].
Accurate results have also been obtained for the Deep Random Neural Network (DRNN) with offline [27] and incremental [28] learning to detect MIRAI attacks, while the DRNN was shown to achieve high performance for detecting different types of unknown attacks simultaneously [29]. Earlier work [30] examined the performance of the classical Random Neural Network (RNN) [31] with offline gradient-descent learning [32] to detect SYN denial of service (DoS) attacks. On the other hand, whereas the work reviewed above focused on detecting cyberattacks and malicious traffic, in this paper, we develop an ARNN-based decision system that identifies the compromised IoT nodes. Successful identification of compromised nodes along with malicious traffic paves the way to fend off distributed attacks (e.g. Botnet and DDoS attacks) in their early stages.
Compromised Device Identification. Whereas the majority of related papers identify compromised devices by detecting malware conveyed by Botnets, some work focuses on identifying compromised devices directly using a variety of techniques including: optimization [33], analyzing communication features [34], using language analysis [35], tracking device location [36], or monitoring the downlink channels of a gateway [37]. Moreover, Reference [38] proposed an ML-based system that analyzes traffic flows and packet features in network layer to identify intrusions in an IoT system. In [39], a Botnet detection system, called BotStop, was developed based on extreme gradient boosting model that analyzes packet traffic. In [40], a Compromised Device Identification System (CDIS-DRNN) is developed based on the DRNN model [41] that analyzes the network nodes' incoming and outgoing traffic. The performance of different attack detection techniques can depend on which datasets are used for learning and testing, and prior to the current paper, the CDIS-DRNN offered the best available state-of-the-art performance for compromised device identification when the publicly available Kitsune Botnet dataset [42], [43] is used. However, none of these works considered the interrelationships between IoT nodes and the propagation of a Botnet attack through these nodes.
In recent work a method was proposed to evaluate a set of network or IoT nodes simultaneously in a single recurrent RNN architecture composed of two interconnected and associated neural networks [44], trained and tested with ground truth data. Therefore, in the sequel we will also compare the ARNN technique developed in this paper using the Kitsune dataset, against the performance offered by CDIS-DRNN.

B. CONTRIBUTIONS OF THIS PAPER
This paper develops an Associated Random Neural Network (ARNN) decision system, designed to assess the overall security of an IoT network by identifying compromised devices using aggregated multi-node traffic information. The ARNN utilizes two associated RNN neurons for each IoT device (or IP address) in the network that is being assessed for security. These neurons assess the security level of specific devices, and advocate that the device is compromised or not based on the information provided by traffic metrics measured at the device, and inter-neuron weights with other neurons that assess neighbouring network nodes. ARNN based attack detection was previously introduced in [44], and initially evaluated for a system that learns from ground truth data and is then tested for the same ground truth data.
Here, it is hypothesized that the ARNN successfully identifies compromised IoT nodes based on its ability to learn both the interrelationships between those nodes and the propagation of a cyberattack. Therefore, after detailing the ARNN learning algorithm's components: the error metric, the weight restrictions, the ARNN initialization, and its learning algorithm are discussed in the Appendix, we thoroughly evaluate its performance for Mirai Botnet attacks on an IoT network with 107 nodes using the Kitsune dataset [42].
First, we train the ARNN on ground truth data from the initial part of the Kitsune dataset, and test its prediction capabilities with ground truth data from the disjoint latter part of the dataset.
Next, we discuss an average normalized metric based on six relevant metrics [40] extracted from traffic data. We then test the ARNN trained with ground truth data, using the average input metric over the testing period which is disjoint from the training period and is subsequent to it. In all cases we evaluate the Accuracy, True Positive Rate and True Negative Rate of the ARNN and observe very accurate attack detection for most of the 107 nodes that are contained in the Kitsune dataset.
Finally, we compare the performance of ARNN against the state-of-the-art best-in-class CDIS-DRNN and four well-known Machine Learning (ML) models for the same problem [40]. In this case, we also train the ARNN without the ground truth but using the ARNN incrementally on successive short training cycles, followed by testing, and pursued for all of the available Kistune data set. The experimental results again indicate that the ARNN offers superior performance -achieving 100% median accuracy and above 92% accuracy for more than 75% of the network nodes in the dataset -with about 3.5 ms of detection time. VOLUME 11, 2023 Authorized licensed use limited to the terms of the applicable license agreement with IEEE. Restrictions apply.  The proposed ARNN decision system has the following characteristics:

85502
1) It is an architecture based on the RNN [31] as shown in Figure 1, which associates a pair X i , Y i of neurons to assess the security level of each of the i-the node in an n-node IoT network, to determine which IoT devices are compromised. Note that while we consider IoT networks as the object of our research, this approach can also be used for other collections of interconnected IP addresses. While X i 's role is to defend the thesis that i is compromised, Y i defends the opposite thesis. Thus the ARNN has a total of only 2n neurons for evaluating an n-node network. 2) Using ground truth data, the ARNN is trained with a specific gradient descent learning algorithm. 3) During usage or testing, the ARNN receives as input the average value of the traffic characteristics that are used to test the CDIS-DRNN and other ML models. This results in substantial computational savings since a scalar input replaces a vector of six elements. 4) Due to its associated and interconnected architecture with a simplified ignorant weight initialization, the ARNN provides accurate assessment on the security of all devices or IP addresses in a network.

II. CONSTRUCTING THE ARNN FOR NETWORK-WIDE CYBERSECURITY ASSESSMENT
We now detail the Network-Wide Cybersecurity Assessment method based on a novel ARNN decision system. This method provides an assessment of the overall security of an IoT network, taking into account the interconnections of devices and the local information provided by these devices. In this method, the ARNN decision system learns direct and indirect relationships between devices in a single network, and estimates the spread of an attack among devices in the IoT network.
The ARNN is composed of n pairs of neurons which are all interconnected in a recurrent structure, where each pair corresponds to an IoT device (or node or IP address) in the network as shown in Figure 1. X i and Y i act as ''adversaries'' indicating whether the node i is compromised or not. Accordingly, the internal state of X i , denoted by K i (t) ≥ 0 indicates that node i is compromised, and that of Y i , denoted by k i (t) ≥ 0, denotes the opposite. As one of the main properties of an RNN neuron, if K i (t) at any time t is strictly positive, then X i sends excitatory and/or inhibitory spikes to the neurons of node j ̸ = i respectively at rates W + ij , W − ij ≥ 0. Similarly, if k i (t) is strictly positive, Y i sends excitatory and/or inhibitory spikes to j ̸ = i respectively at rates w + ij , w − ij ≥ 0. We define the probability that these 2n neurons are firing as In this decision system, when any neuron of node i (X i or Y i ) fires, the internal state of this neuron drops by 1 as When any neuron of node i receives an excitatory spike, its internal state increases by 1, i.e. K i (t + ) = K i (t) + 1 or k i (t + ) = k i (t) + 1. Similarly, when it receives an inhibitory spike, its internal state drops by 1 if the current state is not zero, i.e.
The ARNN equations are a special case of the RNN equations [31], so that: where i is the rate of external excitatory spikes arriving to X i , while it is the rate of external inhibitory spikes arriving to Y i . On the other hand, λ i has exactly the opposite effect. We will choose these two quantities to lie between zero and one: i ∈ [0, 1], λ i ∈ [0, 1].

A. RESTRICTING THE WEIGHTS AND INITIALIZING THE ARNN
The ARNN weights are restricted to reduce the number of gradient descent computations, namely: • Throughout the network we set the ''self-weights'' to zero: for a given value of W > 0 that is detailed below, so that the gradient descent computation only computes W + ij , w + ij ∀i, j. Note that we are dealing with a fully recurrent network so that all distinct nodes are interconnected, since each neuron is connected to all other neurons when i ̸ = j, while for the paired ''opposing neurons'' which are not directly connected in one step, they are connected indirectly to each other via other neurons. The ARNN equations then become: , . (5) • During learning, a total of 2n(n − 1) weights are computed for an ARNN that is assessing the security of an n-node IoT network. The inhibitory weights are obtained directly from the value of W minus the excitatory weight, since W remains constant. Because of the specific mathematics of the RNN learning algorithm [32] only one inversion of a 2n × 2n matrix is needed at each gradient descent step to update all of the weights for the fully connected ARNN.
The ARNN is first initialized so that it does not know initially whether any of the devices (or IP addresses) are compromised. To this effect: • To represent perfect ignorance for all neurons we select the network input rates and weights that will result in Q i = q i = 0.5 for all the neurons, with i = λ i = , where is chosen below.
• Similarly for i ̸ = j the weights are set to As a result we write: , Thus if we take W = 1, we have = 0.75(n − 1), and obtain before the learning algorithm is used.

B. THE ARNN EXTERNAL INPUTS i AND λ i
The external inputs are obtained from data from packet statistics in the network, or from ground truth that is used for training the ARNN, regarding whether given packets are attack packets or normal packets, or other data used for training, or real operational data for testing. We therefore consider that Q i ∈ [0, 1] and q i ∈ [0, 1] are functions Q i ( i , λ i ) and q i (λ i , i ). Noting that: we define the outputs of the ARNN for each network node i, as being the binary Z i variables: (8) where Z i = 1 stands for node i being compromised, while Z i = 0 has the opposite meaning, and 0 < γ < 1 is a threshold.

C. THE LEARNING DATASET LD
The Learning Dataset is a set of packets LD where, for each packet, we know in advance whether it is an attack or a benign, i.e. ''normal'', packet. Thus the LD is used to train the ARNN. The set of packets LD that we use to train the ARNN, as well as the dataset used for testing, contain the ground truth for each packet denoted pk(t, s, d, a), where: • t is the transmission instant of the packet from the source node s, and d is the packet's destination node, • a is a binary label so that a = 1 indicates that it is an ''attack'' packet and a = 0 that it is a ''benign'' packet, • The length of the packet in bytes, including the header, is denoted by |pk(t, s, d, a)|.
• Packets are grouped into ''slots'' lasting τ = 10 seconds, so that the packet's slot number is l = ⌊ * ⌋ t τ , i.e. when (l − 1)τ ≤ t < lτ , and M is the total number of slots in the dataset: 1 ≤ I ≤ M , In the dataset that we use, we observe that on average roughly 100 packets are contained in a 10 sec time slot.
We now determine the successive ARNN inputs from the dataset LD, namely: l Gi ∈ [0, 1] and λ l Gi = 1 − l Gi , the corresponding output K l i , and the decision output Z l Gi which is a binary variable related to K l i . Let S l (i) and R l (i) be the set of packets that have been transmitted or received by node i from the first slot until the end of the l = ⌊ t τ ⌋-th time slot: and: Furthermore When a node receives a significant number of attack packets, one expects that it may be compromised, and in turn send out attack packets. Therefore the l-th desired output for node i as K l i is the ratio of attack packets sent by node i to all other nodes until the end of the l = ⌊ t τ ⌋-th time window: We also define the i-th binary decision variable as D l i for some threshold 1 > θ > 0 regarding the ground truth: (12) so that D l i = 1 indicates that i is a compromised node in the l-th slot, while D l i indicates the opposite. On the other hand, since the ARNN is trained directly with the values of K l i as output, we use the metric defined in (8) to evaluate the output decision from the ARNN, namely: where( l , λ l ) are the corresponding n-vectors obtained from ARNN input data at the l-th slot.

D. LEARNING THE ARNN WEIGHTS FROM THE LD
To construct a balanced training dataset LD, the sequence of slots in the MIRAI dataset [42] was scanned from the first slot l = 1 up to and including the first slot where some node sends attack packets, which turns out to be slot l = 445, and the LD then included slot 433 up to and including slot 457 (a total of 25 slots). On the other hand, the test dataset TD contains all the subsequent slots, starting with slot 458. The ARNN is trained with the LD that uses the slots l of the dataset which are being used for training using the Gradient Descent Algorithm detailed in the Appendix, with the learning rate η = 0.1. It adjusts the ARNN weights so as to minimize the following error function (14) for each successive bucket l within the LD: where Q l i (.) and q l i (.) are obtained from (5).

E. TESTING THE ARNN'S PREDICTION CAPABILITY
We first test the ARNN's ability to act as a predictor about whether a node is compromised, based on training with the LD composed of the sequence of 25 slots around the first slot that contained some compromised nodes, namely slot 445.
The test data stream that is subsequent to the LD that is used, namely slot l = 445 + 13 up to the last slot l = 713. Testing therefore uses the input values l i , λ l i for 458 ≤ l ≤ 713 in the trained ARNN, and the ARNN then outputs the corresponding Z l i values, with θ = 0.3 as the threshold in obtaining the ground truth decision variables D l i from expression (12). The threshold to produce the testing output Z l i is typically of the form γ = 1 − ϵ where ϵ is often zero and always well under 0.1.
The Accuracy, True Positive Rate (TPR) and True Negative Rate (TPR) of ARNN are detailed in Figures 2, 3, and 4.
On the other hand, Figure 5 shows a box-plot for the statistics related to all the node addresses and indicates that the ARNN offers high performance with a median accuracy of 100%. In addition, although the TPR is almost zero for 9 of the addresses, while TNR is almost 0 for 22 addresses; hence the Accuracy exceeds 95% for 80% of all addresses. Figure 2 displays the average decision accuracy for each address i ∈ {1, . . . , 107}, showing that the accuracy of ARNN is above 95% for 50% of the IP Addresses while it is between 62% and 80% for only 20% of them without ever being under 62%, while Figure 4 exhibits the average TNR for the addresses. For 59% of the addresses, the TNR lies above 95% while for 15% of them it is in the 62% to 80% range. Finally in Figure 3 the average TPR is shown for the 39 addresses which were at least once compromised according to the ground truth indicator. The TPR exceeds 95% for most (64%) addresses, and exceeds 90% in over 74% of the them.

III. TESTING THE ARNN WITH THE AVERAGE TRAFFIC METRIC
Six representative traffic metrics were introduced in recent work [40] as being indicative of network attacks and were shown to be effective for MIRAI Botnet detection using available real datasets. Rather than using the full metrics, in this section their average normalized value will be used to test the ARNN attack detector.
To define these metrics, let |p| be the size in bytes of some packet p, including its header and all the data it contains. Let P S,i l be the set of all packets sent by all network nodes to node i in slot l, and let the maximum length in bytes of packets sent by node s to i up to the end of slot l, be L l s = max{|p| : p ∈ P l (s)}. The six metrics from [40], all normalized to a value between 0 and 1, are as follows: • Average packet size of packets received by device i in slot l: • The maximum size of any packet received at node i in slot l: Denial of Service attacks are not always carried out with large packets; for instance, SYN attack packets can be quite short since their effect is to overload the receiving node with requests to open a connection, rather than with the amount of traffic that is being sent. However, the amount of traffic sent by other types of Denial of Service attacks are often meant to cause link and node congestion, so that the amount of attack traffic can be large, and the length of packets that are sent by attackers can be large too. Thus, the amount of traffic and packet size are often relevant metrics for detecting attacks.  • The average number of packets received at i in slot l from all nodes: Note that the denominator in the above expression can be computed iteratively in an efficient manner, so that x i,3 l can be obtained directly from x i,3 l−1 . • The (normalized) maximum number of packets received by node i from any single source in the slot l: where P s,i l denotes the set of packets sent from node s to node i during slot l.
• Finally, the last two metrics, both normalized to lie between 0 and 1, describe the total number of bytes sent to all destinations d by node i, and the total number of packets sent by i to all d: where, L m i is the maximum length of any packet that i sends, and B i is the maximum number of bytes sent out by i in any slot: Since each neuron at any node of ARNN has a single input, i.e. i or λ i , for testing purposes we only use the average value of the normalized metrics as the input to each neuron of ARNN for slot l:

A. THE ARNN TRAINED WITH THE GT AND TESTED WITH AVERAGE METRICS
In the first test using the average metric based input data, we use the ARNN trained with the ground truth GT from real attack LD sequence of 25 slots starting at l = 432 up to 457, as before. Then for each i we use the average metric value to compute l i,mean for l = 458 to l = 713. We input the corresponding values l i,mean , λ l i,mean = 1 − l i,mean into the ARNN for testing. 85506 VOLUME 11, 2023 Authorized licensed use limited to the terms of the applicable license agreement with IEEE. Restrictions apply.  The ARNN output is the Z l i value for each successive l and for each node i, as given in (13) with a threshold which can differ in the range 0.96 ≤ γ ≤ 1 or 0 ≤ ϵ ≤ 0.04. The threshold θ = 0.3 is used for the output decision variables D l i for the known GT . The results are summarized in Figure 6, where we see that the median performance with respect to each of Accuracy, TPR, and TNR is 100%. ARNN achieves Accuracy above 99% for 97 of 107 IP Addresses, while there are 10 nodes with outlier performance, three with Accuracy below 30%, two between 60% − 30%, and five node addresses with Accuracy between 80% − 60%. In addition, as the lower whisker shows, TPR is above 86% for 75% of all nodes, and while lowest TNR performance is about 98.5%.

B. ARNN TRAINED AND TESTED WITH THE AVERAGE METRICS
We now consider training as well as testing the ARNN using the average metric inputs l i,mean and λ l i,mean = 1 − l i,mean . To this effect, we still use the ground truth data represented by K l i , in the algorithm detailed in the Appendix. The error function that needs to be minimized during training becomes: with Q l i (.) and q l i (.) are given by equation (5), and the gradient descent parameters is as previously η = 0.1.
With regard to the previous case where the ARNN was trained with the GT , we see some very very minor variations in Accuracy, True Positive Rate and True Negative Rate. For instance, in our experiments we only observed 3 network nodes out of 107 where Accuracy differed between the previous sub-section and this one. In particular we observed that:  • For i = 47 we have ACC = 93.1% using the Average Metric for training, while using the GT it is ACC = 93.94%. In fact, we also observe that using the Average Metric for training results in general in somewhat fewer False Alarms, i.e. a higher True Negative Rate. The corresponding results are summarized in the Box Plot Diagram for Accuracy, True Positive and True Negatives given in Figure 7.

IV. INCREMENTAL TRAINING OF THE ARNN
In recent work [40], the CDIS-DRNN, a compromised device identification method was presented. This attack detection method is trained sequentially on the assumption that off-line ground truth data is not available. CDIS-DRNN is composed of a deep learning feedforward RNN architecture which does not exploit knowledge of the interconnections between network nodes.
Thus, as with CDIS-DRNN, in this section we will assume that offline training data is not available in advance of the exploitation of the ARNN for attack detection. In such a case, the ARNN will be trained incrementally in parallel to its online operation. To this end, we update the weights of the ARNN every successive 6 slots, i.e. at the end of slot l such that mod(l, 6) = 0, where each training window corresponds to 1 minute, whereas the ARNN provides a decision for each device i at the end of each individual slot l, i.e. every 10 seconds.
Thus using the data for a successive set of 6 slots, the ARNN is trained with the algorithm presented in the Appendix, using the training data TD constructed a follows: . . , l}. We now present the performance of the incrementally trained ARNN decision system for compromised device identification. During the performance evaluation, we set = 0.3 and 0.96 ≤ γ ≤ 1. The Accuracy, True Negative Rate (TNR), and True Positive Rate (TPR) of the ARNN with incremental training are presented in Figure 9 as a Box-plot. The results in this figure show that the ARNN achieves a median accuracy of 100% while Accuracy is shown to be greater than 97% for 75% of all network nodes. These results VOLUME 11, 2023  also show that the TNR is above 99% for approximately 72% of nodes, while 58% of the nodes are 100% accurately identified as being compromised, which provides a median TPR value of 100%.
In Figure 8, we compare the performance of the ARNN with similar results obtained recently with the CDIS, which is a DRNN-based system [40], with respect to Accuracy, TNR and TPR. The results show that the ARNN significantly outperforms the CDIS-DRNN by providing approximately 50% higher Accuracy for all network nodes. This appears to be due to the fact that the ARNN, through its internal neuron connections, is able to simultaneously process information regarding the nodes themselves, and also regarding their connections to other nodes.
We further compare the performance of ARNN against four well-known ML models 1-Dimensional Convolutional Neural Network (1D CNN), Long-Short Term Memory (LSTM), Multi-Layer Perceptron (MLP), and Decision Tree (DT), which are often used for intrusion detection systems in recent research [10], [11], [14], [19], [45]. The MLP is comprised of three fully connected layers each with n neurons. 1D CNN and LSTM respectively consist of convolution and LSTM layers connected to two fully connected layers, where each layer is comprised of n neurons. In addition, all activation functions in 1D CNN, MLP, and LSTM are sigmoids, and we implemented these models using Keras API in Python. For the implementation of DT, we used the Scikit-learn library in Python setting the maximum depth and maximum number of features equal to n. Figure 10 displays the comparison of ARNN with 1D CNN, LSTM, MLP, and DT with respect to Accuracy, TPR, TNR, F1 Score, Recall, and Precision. These results show the superior overall performance of ARNN against these well-known ML models. It can be seen that the performance of the ARNN is slightly lower than DT in terms of average Accuracy, TNR and Precision but significantly higher than the other models in terms of TPR, F1 Score and Recall. Thus although DT is slightly better than ARNN at detecting negative samples (i.e. DT gives fewer false positives compared to ARNN), ARNN significantly outperforms all models (at least by 19%) in detecting compromised nodes.
Finally, Table 1 compares the same set of ML models with respect to execution and training times, showing that the ARNN is the second fastest in online operation, among the five methods that were tested, with an execution time of ≈ 3.5 ms to rapidly identify nodes compromised by malicious Bots in real time, but it is the slowest one regarding learning. For an n-node IoT or IP network's attack prediction, the Deep Learning Algorithm for the ARNN based on a fully connected ''recurrent'' RNN with 2n neurons [32], requires at each for positive constants A, B, C. On the other hand, the CNN or the MLP are feedforward models, typically with three feedforward layers, whose learning algorithm is of time complexity O(n 2 ). Thus they will require the update of some most 2n 2 weights, yielding a Learning computational time: for a positive constant b which is comparable to C. This simplistic calculation suggests an approximate 8(A+B)n 3 2bn 2 = 4n A+B b fold increase in learning times for the ARNN with respect to an MLP or CNN model. For the n = 107 network that is evaluated in this paper, this corresponds to a 428× A+B b fold increase, and for A + B ≈ 20b this analysis is compatible with the results shown in Table 1.

During our experimental evaluation:
• The ARNN is first trained offline with ground truth data and tested with disjoint ground truth data exhibiting a high level of precision.
• Then, the offline trained ARNN is tested with a simplified average input metric directly extracted from measurements, and again a high level of precision is observed.
• The average metric is then used as input for offline training, while the ground truth is used in the error function, and testing is carried out with the average metric using disjoint data.
• Finally, online incremental training using the testing data output for learning is also tried, without use of the ground truth.
All these experiments use the Kitsune dataset, and confirm the high level of accuracy of the ARNN's predictions. Experiments are also conducted to compare the ARNN for the identification of compromised IoT nodes using real MIRAI Botnet attack data from the Kitsune dataset, with the recent state-of-the-art CDIS-DRNN technique [40] and the well-known 1D CNN, LSTM, MLP and DT models are also carried out, indicating that the ARNN: • Provides significant improvement compared to CDIS-DRNN, achieving 92% median accuracy and minimum 60% accuracy per node, • Outperforms the ''best-of-class'' CDIS-DRNN by a wide margin with respect to both TNR and TPR, and • Identifies compromised nodes in just under 3.5 ms at least 19% more accurately than 1D CNN, LSTM, MLP and DT models.
Accordingly, as its main advantage, the ARNN successfully captures the interrelationships and communication patterns between devices, thus providing considerably high identification performance that is superior to state-of-the-art techniques. On the other hand, although the ARNN provides a detection in under 3.5 ms, it requires significantly longer training time compared to well-known ML models. That is, while the current design and implementation of the ARNN learning algorithm may not be suitable for time-limited online learning applications, it is highly successful and promising in identifying compromised nodes with offline learning and online detection.

VI. CONCLUSION
This paper presents and evaluates the novel ARNN cyberattack decision system that utilizes two interconnected competing RNN neurons for each IoT node, where each network node is related to a neuron pair connected with all the neurons associated with other nodes in the network. The unique structure of the ARNN evaluates the security VOLUME 11, 2023 of each node in a given network by including locally relevant data from incoming traffic as well as the relationship between all nodes, as part of the decision mechanism. In this way, ARNN learns, as one of its most important features, both the normal communication patterns between nodes and the propagation of a cyberattack over the IoT network.
The ARNN can be particularly useful for private or industrial networks containing a few hundred nodes. It can be used to detect attacks such as Botnets, where distinct node behaviours are correlated due to the propagation of the attack. Its use with average traffic metrics measured directly on incoming traffic at each network node, removes the need for separate computationally costly attack detectors that are placed at each node in the network. Thus, in sharp contrast with traditional attack detectors, the ARNN collectively evaluates a large number of interconnected nodes in a single neural network architecture at a low additional computation cost per node.
We have presented the ARNN architecture, with weight restrictions to simplify learning, and ''ignorant initialization'' that helps to avoid initial biases of the ARNN. The error function used for ARNN learning is introduced, and the specific gradient learning algorithm is detailed in the Appendix. Most of the paper is then devoted to evaluating the performance of the ARNN using real Botnet attack data, and real benign traffic and comparing that against the state of the art methods based on the commonly used metrics of Accuracy, True Positive Rate, True Negative Rate, F1 Score, Recall, and Precision. The results revealed that ARNN achieved significantly superior performance with highly accurate detection of compromised nodes and low false alarms, but with high training time.
In future work, the computational complexity of ARNN learning which was discussed in this paper from a theoretical perspective, will be analyzed in detail from practical experimental data, and incremental schemes will be considered for on-line learning to reduce the amount of energy that such algorithms consume [46]. Since many IoT devices have limited battery power, this is important for sustainability, and it can also enhance IoT security.
The gradient descent algorithm for the ARNN weights seeks local minima of E in (14), and computes the partial derivatives: that are needed in the computation (24): Equations (24), (25) are used to update the ARNN weights for steps k = 1, 2, . . . of the Gradient Descent Rule with η > 0: As indicated earlier, the value η = 0.1 is used. From the inputs = ( 1 , . . . n ), λ = (λ 1 , . . . , λ n ), we derive the derivatives needed in (26) from (5): where D i , d i are the denominators of Q i , q i respectively in the expression (5): We write the state vectors Q = (Q 1 , . . . , Q n ) and q = (q 1 , . . . , q n ), the corresponding derivatives Q U ,V = (Q U ,V 1 , . . . , Q U ,V n ) and q U ,V = (q U ,V 1 , . . . , q U ,V n ), and define the n × n matrices as: where vector δ V has zero elements everywhere, except for position V where the value is 1. Then (27) and (28) expressed as vectors yield: resulting in: We can then write the matrices: and by the symmetry of Q U ,V and q u,v , and of Q u,v and q U ,V , we have: which provides us with the derivatives of Q and q.