Design and Development of RNN-based Anomaly Detection Model for IoT Networks

Cybersecurity is important today because of the increasing growth of the Internet of Things (IoT), which has resulted in a variety of attacks on computer systems and networks. As the number of various IoT devices and services grows, cyber security will become an increasingly difficult issue to manage. Malicious traffic identification using deep learning techniques has emerged as a key component of network-based intrusion detection systems (IDS). Deep learning methods have been a research focus in network intrusion detection. A recurrent neural network is useful in a wide range of applications. This paper proposes a novel deep learning model for detecting anomalies in IoT networks using recurrent neural networks. The proposed model is implemented in IoT networks utilizing LSTM, BiLSTM, and GRU-based approaches for anomaly detection. A convolutional neural network can analyze input features without losing important information, making them particularly well suited for feature learning. In addition, we propose a hybrid deep learning model based on convolutional and recurrent neural networks. Finally, employing LSTM, BiLSTM, and GRU-based techniques, we propose a lightweight deep learning model for binary classification. The proposed deep learning models are validated using NSLKDD, BoT-IoT, IoT-NI, MQTT, MQTTset, IoT-23, and IoT-DS2 datasets. Our proposed binary and multiclass classification model achieved high accuracy, precision, recall, and F1 score compared to current deep learning implementations.


I. INTRODUCTION
The fast Internet growth has facilitated the creation of the IoT. A common element contributing to this development is the ease with which IoT devices are available, affordable, and convenient in our everyday lives. Due to the fast advancement of wireless communication technologies, developers have built extremely low-cost IoT nodes that support data collection, data analysis, and wireless transmission [1]. IoT is a network of linked physical devices and sensors that enables information sharing via the Internet. IoT networks have shown particularly efficient in collecting, analyzing, reporting, and predicting information for incorporation into future plans. Many technologies, including protocols, software, and component elements, make up IoT networks. An IoT architecture is a collection of multiple components such as sensors, actuators, protocols, cloud services, and layers of an IoT communication network. The layer is usually made up of many levels that let administrators identify, analyze, and manage the system's consistency.
There is no one agreed-upon design for the IoT, with many designs suggested by various developers. The primary types of architecture are three, four, and five-layer structures. A three-layer structure, comprised of perception, network, and application layers, is the most basic configuration for IoT implementation. The perception layer is equipped with sensors, actuators, and computational hardware to detect and collect information from the environments. The physical layer handles tasks such as setting a frequency, manipulating the signal, encrypting the signal, and transmitting and receiving data. This layer has several issues, including power consumption, security, and compatibility. The network layer connects IoT devices to other smart objects, network equipment, and services. This layer receives data from the perception layer and passes the data to the application for analytics and smart services. The network layer needs to deal Internet of Things systems would result in financial losses and seriously harm human life. A crucial part of keeping a network safe is network intrusion detection to monitor and identify potential threats, events, and breaches. Security systems, such as firewalls and intrusion detection systems, are vulnerable to recent cyber threats since their techniques are focused on static attack signatures and cannot identify new attack variations [7].
Due to IoT's significant economic effect and extensive influence on our lives is a desirable target for cybercriminals, and cybersecurity has risen to the top of the priority list for the IoT infrastructure. Even though cybersecurity has been studied for years, the massive IoT network and the introduction of new threats have rendered conventional approaches ineffective. The study conducted by Tsimenidis et al. [8] provides a comprehensive assessment of deep learning models that have been developed for Internet of Things intrusion detection. In a detailed, organized examination of how deep learning has been used for IoT cybersecurity and its unique contributions to creating efficient IoT intrusion detection systems, solutions have been classed by model.
It is becoming more concerning for most service providers as the number of computer networks and Internet threats continues to grow. It has prompted the development and deployment of IDSs to help prevent and mitigate the threats produced by network intruders. An important role in detecting network cyberattacks and abnormalities has been performed and continues to be played by intrusion detection systems. Many intrusion detection systems (IDSs) have been suggested by researchers worldwide to tackle the danger of network intruders. However, most previously suggested intrusion detection systems have significant rates of false alarms [9].
The primary aim of this paper is to develop an RNN based model for IoT networks. We effectively used kernel regularizer, bias regularizer, activity regularizer at LSTM, BiLSTM, and GRU layers, layer normalization, and activity regularization layers to develop a novel RNN and CNNRNN based model for multiclass and binary class classification. In contrast to Batch Normalization, the layer normalization method normalizes the activations of the previous layer for each given sample in a batch individually rather than across the entire batch. Activity regularization makes changes to the cost function that is based on input activities. Overfitting of the RNN model was assessed using drop layer, early stopping, and 5-fold cross-validation techniques.
We first used class weights during the training phase to resolve class imbalances in the datasets. The weights assigned to classes were determined by the number of instances of each class, so a minority class with a small number of instances will receive a high weight. New synthetic samples were created using the borderline SMOTE algorithm. The random state controls the algorithm's randomization. The borderline SMOTE is being used to ensure the training set is balanced. The set feature we extracted from pcap files is generalized features that can be used for any IoT network. We evaluate our proposed multiclass and binary classification model using NSLKDD, BoT-IoT, IoT Network Intrusion, IoT-23, MQTT-IoT-IDS2020, MQTTset, and IoT-DS2 datasets. The contributions of this paper are:  Design, develop, and evaluate an anomaly detection model for IoT network using a recurrent neural network.  A hybrid model of convolutional and recurrent neural networks for anomaly detection in IoT Networks.  A lightweight recurrent neural network model for IoT network anomaly detection.  Performance improvements of binary and multiclass classification models.
The remainder of this paper is organized as follows: Section II provides a discussion of the related work. Section III explains the design and development of the recurrent and convolutional neural networks model for multiclass and binary classification anomaly detection model for IoT networks. Section IV discusses the data collecting process for evaluating the proposed models. Section V presents the evaluation of the results, followed by a discussion of the findings in Section VI. Finally, Section VII concludes the paper and offers recommendations for future research.

II RELATED WORK
Millions of IoT devices are embedded in smart cities, enabling crucial applications such as smart homes, autonomous vehicles, and communications. The smart city is based on millions of heterogeneous sensors that do not support traditional security frameworks. Several manufacturers use inadequate protection mechanisms for their devices and fail to upgrade their firmware in response to recently discovered operational security breaches. To achieve comprehensive management of sensor operating systems while maintaining perfect security, smart cities need a common architecture that combines soft computing and Deep Learning [10]. While many machine learning methods have been employed to identify anomaly-based invasions, relatively few attempts have been made to classify recurrent neural network. The IoT has evolved fast in recent years, and cyber cyberattacks on IoT devices are becoming common. It has become more necessary to have an effective approach for identifying malicious attacks in the IoT domain to reduce security threats on IoT devices. Alkahtani et al. [11] developed a hybrid deep learning approach based on CNN and LSTM for detecting botnet assaults on nine market IoT devices. Their methodology proved effective in identifying botnet assaults from various IoT devices with an average accuracy of 90%.
Autonomous Vehicles (AVs) are prone to safety and security issues, risking human life. The Internet of Vehicles (IoVs) is a network of manually operated vehicles connected to the Internet. If cyber attackers get access to these vehicles, they might be utilized for malicious purposes. Khan et al. [12] have developed a multistage intrusion detection system to identify Autonomous Vehicles and Internet of vehicle intrusions while minimizing the number of false alarms. The proposed framework uses a BiLSTM architecture to detect intrusions from AVs' network gateways and communication networks. Additionally, the suggested system can detect zero-day outbreaks in networks of IoVs.
Smart Home Network IoT devices are very susceptible to sophisticated botnet assaults. Popoola et al. [13]examine the performance of RNNs in properly classifying network traffic samples belonging to minority groups is severely unbalanced network traffic data. To learn hierarchical representations of highly unbalanced network traffic data with different degrees of abstraction, many layers of RNN are stacked. To effectively capture the classifying properties of severely unbalanced network traffic samples, the stacked RNN model was used instead of the RNN model. The SRNN model also showed excellent generalization abilities when recognizing network traffic samples from minority groups.
A significant increase is being seen worldwide in the number of computer-controlled automobiles, which is on the rise at an alarming rate. Even though this improves the driving experience, it introduces a new security vulnerability in the automobile business. Desta et al. [14] suggested an LSTM-based intrusion detection system based on arbitration ID sequences. They used LSTM for IDS implementation. Their accuracy was just 60% using this strategy. They were only able to get an accuracy of 0.6 using this strategy. Because applying this finding to a real car would result in a large number of false negatives, they designed a second strategy that utilizes log loss as an anomalous indicator.
Malicious traffic identification using deep learning approaches has become a key component of network intrusion detection research and development. Most IDS need packets to be gathered into certain flows and then analyzed, incurring processing delays, even the most sophisticated ones. In order to identify malicious traffic at the packet level, Wang et al. [15] offer a deep learning strategy, employing hierarchical networks, which can learn the properties of communication using basic data packets. They also explored how data balance affects classification performance and time efficiency between the LSTM and GRU models.
Hao et al. [16] utilize the encoder to automatically process and analyze network packets to get properties that appropriately reflect the network packets. The variant-gated recurrent units dynamically understand data packets content and header attributes to increase the IDS detection rate significantly. The experimental findings from ISCX2012 indicate that intrusion detection using the proposed variantgated recurrent units provides a greater level of accuracy and detection rate. Using binary weights and activation functions, their proposed model provides a greater representation of the data than the original raw data and helps to minimize the amount of memory and access time. An overview of recent developments in deep learning for intrusion detection is presented in Table 1. In Table 1, DR mean detection rate, Acc represents the accuracy, F1 represents the F1 score, and pr represents precision.
Recent years have seen a tremendous influence on industrial production because of the fast development and widespread use of new technologies, resulting in smart manufacturing (SM). However, industrial systems based on the IoT are currently one of the most targeted sectors for various cyberattacks. Anomaly Detection is proposed by Huong et al. [65] to identify industrial control systems cyberattacks. The framework outperforms existing time series data detection solutions in terms of detection performance. In the Industrial IoT (IIoT), a vast quantity of data processing takes place in the Cloud and at the edge to perform various types of analytics. IIoT routing attacks can be detected using Nayak et al. [66] deep learning-based routing attack detection technique. The proposed solution uses parallel learning and detection to facilitate deep learning on IIoT devices with limited processing power. The parallel model output is tested in an IIoT network to compare the performance of distributed and centralized threat detection in an RPL network. Training time is significantly reduced when a parallel GAN model is used.
Although various relevant research has utilized deep learning for NIDS, most of these techniques fail to account for the effect of overfitting when deep learning algorithms are implemented. Therefore, the anomaly detection system's resilience may be compromised, making it less effective in detecting zero-day cyberattacks. Convolutional neural networks (CNNs) and a novel regularizer technique are proposed by Elsayed et al. [59] to categorize flow traffic into normal and attack categories. Additionally, they propose a lightweight NIDS based on training CNN-based models with fewer features without sacrificing model performance significantly. Advanced information and communication technologies have facilitated the dissemination of a large quantity of information, which continues to grow day by day through the Internet and the creation of new added value via the use of internet-based activities. Increasing numbers of diverse connecting points with high computing capability have expanded cyber security concerns. Biswas et al. [57] have suggested a new deep learning strategy to distinguish malicious Botnet traffic from normal traffic.
The GRU is the most effective model for botnet detection; however, it is computationally costly, yet it can handle large amounts of data and identify sequences efficiently. Laghrissi et al. [58] developed deep learning techniques for identifying attacks based on the LSTM algorithm. They employed PCA and Mutual information to reduce the data's dimensionality and choose the best features. These techniques are also evaluated in terms of performance and processing time. The extensive availability of Internet services around the globe has presented a significant challenge to service providers in terms of protecting their systems, particularly against new breaches and threats.
The battery life of IoT devices is a huge issue since the devices use a lot of energy once they connect with each other. IoT devices would also include important network data, posing severe privacy and security concerns. Botnet attacks are significant threats to IoT-based smart devices. Using statistical learning-based botnet detection, Asharaf et al. [67] secure IoT-based smart networks against botnet cyberattacks. Louk et al. [68] address a gap in the literature by showing the importance of ensemble-based models for detecting potential attacks in a cyber-physical power system. They balanced the dataset by employing oversampling and undersampling techniques. Oversampling and undersampling were beneficial in a boosting-based ensemble but were ineffective in a bagging-based ensemble. Ensemble learners have outperformed single learners in a wide variety of applications, including the cybersecurity area. However, the majority of previously published works continue to produce unsatisfactory outcomes as a result of insufficient ensemble design. Nkenyereye et al. [69] demonstrate the efficacy of stacking ensemble-based models for anomaly detection, where a deep neural network is utilized as the basic learner model. The suggested model's efficacy and that of the underlying DNN model are experimentally compared using a variety of performance criteria.
A two-level hybrid anomalous activity detection model has been proposed to detect intrusions in IoT networks, which detects abnormal activity at level1 and analyses the anomalous activity identified at level2 [70]. The level-2 model selects relevant features using Recursive Feature Elimination, oversampling using Synthetic Minority Over-Sampling Technique (SMOTE), and cleaning the datasets using Edited Nearest Neighbors (ENN). A three-layer system was proposed to detect intrusions in a smart grid system [71]. The proposed structure includes an IDS in each Home Area Network (HAN) and Neighborhood Area Network (NAN) and many IDS sensors in the WAN. In modeling anomaly-based intrusion detection systems, feature selection is important. A filter-based feature selection methodology was proposed for anomaly-based intrusion detection systems that leverage information gain by considering each feature's consistency, dependency, content, and distance [72]. Additionally, using an industrial control system dataset, the suggested model for feature selection was evaluated for anomaly detection in SCADA networks [73].
It is challenging to extract valuable information from network traffic to identify possible anomalies. We investigate many types of network flow features to resolve this challenge [74] [75]. A feed-forward neural networkbased method for identifying anomalous activity in IoT networks based on flow and control flags features has been presented [76]. The model was assessed for binary and multiclass classification using a variety of IoT intrusion datasets. Using conditional GANs to create realistic distributions for a given feature set, a framework for identifying anomalies in IoT networks was proposed, which overcomes data imbalance by using conditional GANs to detect abnormalities in IoT networks [77].
Since IoT is increasingly being used in critical infrastructures and cyber-physical systems, there has been a significant increase in research efforts to develop efficient defenses against cyber-attack. IoT networks can be protected from a wide range of cyberattacks using a framework called Boost-Defense, developed by Al-Haija et al. [78]. A strong classifier for identifying and categorizing cyber-attacks in IoT networks is built using AdaBoost machine learning, Decision Trees, and substantial data engineering approaches. NIDS are lightweight, versatile, and anomaly-based construct profiles for normal and malicious behavior using a variety of ways. Al-Haija et al. [79] used machine-learning approaches to develop, construct, and evaluate an anomalybased IoT NIDS. It was modeled as supervised multiclass learning, where a classification function was learned to map a collection of labels to ten classes. VOLUME XX, 2017 9

A. RECURRENT NEURAL NETWORK
The usage of neural networks has the potential to enhance every part of our daily life. An artificial neural network with a sequential information structure is known as a recurrent neural network. They are referred to as recurrent because they execute the same function on each sequence element, with the outcome depending on prior computations. RNNs are networks that include loops, which allow for data persistence. Long Short-Term Memory (LSTM) networks are a kind of RNN that can learn long-term dependencies.
Hochreiter et al. [80] introduce the LSTM networks. The LSTM networks perform very well across a broad range of issues and are now extensively utilized. LSTMs are designed to prevent long-term dependence [81]. Each recurrent neural network is composed of a chain of repeating neural network modules. A recurrent neural network includes loops, enabling information to be retained in the network. In Fig.1, a simple recurrent neural network with loops is shown. The neural network in the illustration above, A, examines the input and then generates output ℎ . A loop enables data to be transferred from one network phase to the next. LSTMs are expressly intended to prevent the issue of long-term dependence. Each recurrent neural network comprises a chain of repeating neural network modules. Table 2 shows a list of symbols used to help understand the various concepts presented in the following sections.  The first stage of LSTM is to determine which information from the cell state will be discarded. A sigmoid layer, known as the "forget gate layer," makes this determination. The sigmoid layer examines the values in ℎ &' and and returns a value between 0 and 1 for each value in the cell state / &' . A 1 indicates "totally retain this," whereas a 0 indicates "entirely discard this." The forget layer functioning of the LSTM is shown in Fig. 3, and the operation of forgets gate layer is represented by (1). It is necessary to determine what additional information will be stored in the cell state as a result of the input gate layer decision. The input gate layer, which is a sigmoid layer, determines the values to update first. Equation (2) represents the operation of the input gate layer, and the input gate layer functioning of the LSTM is shown in Fig. 4. A tanh layer generates a vector of potential nominee values, ∁ 6 t that may be included in the state. A tanh layer operation is represented by (3). Equation (4) updates the cell state Ct-1 into the new Ct, and (4) operation is resented in Fig. 5.
The last step is to determine the output. This result will be based on the current state of our cell, but it will be a simplified form of it. First, a sigmoid layer is implemented that sets the output of the cell state based on the values received. Equation (5). represents this operation. As shown in (6), the system implements a cell state to the tanh function and then multiplies it by the output of the sigmoid gate. This process ensures that only the values that have been selected to return are sent. The output ℎ operation are presented in (6) and Fig. 6. = $: C . <ℎ &' , > + , C % (5) Cho et al. [82] developed a slight variant of LSTM called Gated Recurrent Unit (GRU). The GRU has only two gates, while the LSTM has three gates. The LSTM input and forget gates are combined into an "update gate" in GRU. The GRU eliminated the cell state and transferred information via the hidden state. Similar to an LSTM forget and input gates, the update gate operates in the same way. It determines which data should be discarded and which should be included. The reset gate is used to determine how much previous information should be erased from the memory. The Tensor operations that GRU use are fewer in number compared to LSTM, and as a result, they are faster to train than the LSTM model. A single GRU cell operation is presented in Fig. 7. Equation (7) represents the reset gate operation, (8) describes the update gate operation, (9) shows the current memory state of the GRU, and (10) represents the final memory state of the GRU.
BiLSTM is an extension of conventional LSTM that enhances model performance on sequence classification tasks by learning in both directions simultaneously. Because they train two LSTM on the input sequence instead of one, Bidirectional LSTM is useful when all timesteps in the input sequence are accessible. BiLSTM considers forward and backward activation to calculate the output.

B. PROPOSED MODEL
Deep learning techniques are gaining popularity due to their ability to detect computer network threats and abnormalities in various applications. A recurrent neural network has shown to be effective in a variety of areas. Due to its better capability, this article presents a model based on a recurrent neural network. This paper design and develop LSTM, BiLSTM, and GRU-based models for anomaly detection in IoT networks. The model consists of an input layer, output layer, and four recurrent, activation, normalization, activity regularization, and dropout layers. Overfitting is a major concern for deep learning models. We used kernel, bias, and activity regularizers at the LSTM, BiLSTM, and GRU layers. The kernel regularizer imposes a penalty on the kernel of the layer; the bias regularizer enforces a penalty on the bias of the layer while the output of the layer is penalized by the activity regularizer. The regularizer uses the value of l1-l2 to compute the value for kernel, bias, and activity regularization. The Keras have different activation functions for activation layers. For the activation layer in this paper, we adopted the LeakyReLU activation function. Next, we used layer normalization, which usually improves the acceleration and stabilization of the learning process by reducing error rates. A neural network can be encouraged to learn weak features by using an activity regularization layer. The activity regularization support l1, l2, and l1-l2 regularization techniques. The activity regularization layer updates input activity that is dependent on a cost function. A recurrent neural network is prone to overfitting and will need significant adjustments to the training dataset to avoid it. By disregarding certain neurons during the training period, a dropout layer mitigates the risk of overfitting. Neuron weights in a neural network settle into their environment as a network learns. Four recurrent, activation, normalization, activity regularization, and dropout layers were used across LSTM, BiLSTM, and GRU models. Before the output layer, we used a dense layer with 512 neurons and a ReLU activation function. The output layer is the last layer of the model, and the number of neurons in this layer is dependent on the number of classes in the dataset.
We design and build an anomaly detection model in IoT networks using a recurrent neural network. The same framework was used to construct LSTM, BiLSTM, and GRU models. Fig. 8 shows a layered perspective of the proposed recurrent neural network-based LSTM, BiLSTM, and GRU model. Table 1 shows the proposed LSTM model components, including one input layer, four LSTM layers, four activation layers, four normalization layers, four activity regularization layers, four dropout layers, one dense layer, and one output layer. The input layer receives network traffic with 64 features. First, a 64× 1 input vector is created to suit the 64 best features selected by our feature selection method presented in [83]. The LSTM model uses 512 units at each LSTM layer. The LSTM model also uses kernel, bias, and activity regularizers at the LSTM layer. The regularizer uses the l1-l2 function for penalties. The activation layer uses the LeakyReLU activation function, an alternative to Rectified Linear Unit (ReLU) leaky implementation. It does not contain zero-slope sections; LeakyReLU solves the "dying ReLU" issue. There is a significant increase in the pace of learning. It's been shown that having the "mean activation" near to 0 speeds up training. In contrast to ReLU, leaky ReLU is more "balanced," As a result, it can learn more quickly [84]. A model normalization layer may usually aid in accelerating and stabilizing the learning process by reducing error rates. The LSTM model training and validation details are presented in algorithm 1. The BiLSTM and GRU models were created using the same technique.
Recurrent network hidden state dynamics can very well be stabilized via layer normalization. To avoid introducing additional dependencies across training instances, unlike batch normalization, layer normalization calculates normalization statistics from the total of inputs to the neurons of the hidden layer [85]. Overfitting is a major issue in neural networks. To reduce the chance of overfitting, we used the regularization and dropout layers. The activity regularization layer makes changes to the input activity that is dependent on the cost function. l1-l2 factors serve as a baseline for the activity regularization layer functionality. During training, the dropout layer changes some input units to 0 at a frequency equal to the dropout rate, thereby aiding overfitting prevention. This implies that their impact on downstream neuron activity is eliminated temporally on the forward pass, and any weight changes are not transferred to the cell on the backward trip.
BiLSTM is a sequence processing model with two LSTMs: one processing input in the forward direction and the other LSTM processing in the backward direction. In this way, BiLSTM-based modeling provides higher predictions than normal LSTM-based modeling due to the extra training of data it receives. Long training time is needed for the BiLSTM model. The proposed BiLSTM model consists of an input layer, four BiLSTM layers, four activation layers, four-layer normalization layers, four activity regularization layers, four dropout layers, one dense layer, and one output layer, as shown in Table 3.

FIGURE. 8 The proposed LSTM, BiLSTM, and GRU model layer's view, parameter, and hyperparameters for multiclass classification.
The GRU was created to address the vanishing gradient issue inherent in the conventional recurrent neural network. GRU may also be seen as a variant of the LSTM since they are both constructed similarly and, in certain instances, provide equally good outcomes. GRU uses so-called update and reset gates to overcome the vanishing gradient issue of a conventional RNN. To put it another way, two vectors determine what information should be sent to the output device. The proposed GRU model also consists of an input layer, four GRU layers, four activation layers, four-layer normalization layers, four activity regularization layers, four dropout layers, one dense layer, and one output layer. Table  3 summarizes the proposed recurrent neural network model. In deep learning, a Convolutional Neural Network (CNN) is an algorithm that takes in an input picture, assigns significance to various image elements, and can differentiate between them. Compared to other classification methods, the amount of preprocessing needed by a convolutional neural network is much less. Recent research has shown that a convolutional neural network can produce outstanding speech recognition and image identification results. The advantages of a convolutional neural network may also be more fully revealed if intrusion detection problems are transformed into image recognition problems. A convolutional neural network can effectively capture spatial and temporal correlations related to intrusion detection. The design provides a better match to the intrusion detection issue because of the reduced number of parameters and weights' reusability [86].

FIGURE. 9 The proposed convolutional neural network-based LSTM, BiLSTM, and GRU model layer's view, parameter, and hyperparameters for multiclass classification.
We design and build a model using convolutional and recurrent neural networks. Convolutional neural network learns input features without losing essential information, making them ideal for prediction. A layered view of the proposed convolutional neural network-based LSTM, BiLSTM and GRU models is presented in Fig. 9. Table 4 shows the configurations of LSTM, BiLSTM, and GRU models based on a convolutional neural network for multiclass classification. We used a convolutional 1D model for the convolutional layer. Convolution is computed in 1D using temporal access, with the kernel moving in just one direction. CNN1D requires two-dimensional input, and output data is often used to process time-series data. First, an input vector 64×1 is created to correspond to the 64 best features selected by the feature selection method. The convolutional layer is combined with the activation, normalization, regularization, and dropout layers. The activation layer LeakyReLU activation function with an alpha value of 0.2. The layer normalization uses a feature axis for normalization. The activity regularization layer uses l1-l2 functions, and the dropout layer randomly drops neurons to reduce the chance of overfitting. A pooling layer is used to minimize the dimension of the convolved features to reduce the computing resources needed to analyze the data. We utilized an average pooling layer in the model.
The activation, normalization, regularization, and dropout layers are combined with two LSTM, BiLSTM, or GRU layers. The dense layer receives input from the final dropout layer, and the output of the dense is transferred to the output layer. The dense layer uses 512 neurons. The number of neurons in the output is determined by the number of classes in the dataset. The output layer of our proposed model has four, five, six, ten, and twenty neurons. The following benefits accrue from the initial model interface being a convolutional neural network: The spatial and temporal correlations associated with an anomaly detection problem may be captured effectively by a convolutional neural network when the optimal filters are used and with fewer parameters and reusable weights, the architecture is more suited to fit the anomaly detection data. The pooling layer lowers computing power by reducing the dimension of the features. We propose a binary classification model utilizing a single recurrent neural network-based hidden layer. A layered view of the binary classification model is presented in Fig. 10. The model consists of an input layer, and the number of inputs to the model is equal to the number of features in the dataset. The hidden layer consists of LSTM, BiLSTM, or GRU layer combined with the activation, normalization, activity regularization, and dropout layer. The activation layer uses the LeakyReLU action function, The normalization layer uses batch normalization, and the regularization layer uses activity regularization along with the l1-l2 penalty function. The regularization and dropout layers lower the likelihood of the model overfitting: the dense layer use 512 neurons and the ReLU activation function. The dense layer works as a classification layer, and the output layer uses two neurons to classify the data as normal or anomalous. The configurations of the LSTM, BiLSTM, and GRU models for binary classification are shown in Table 5.

IV DATA COLLECTION
BoT-IoT [87], IoT-NI [88], IoT-23 [83], MQTT [83], MQTTset [83], and NSLKDD [89] datasets were used to evaluate the proposed models. The first step included processing pcap files of BoT-IoT [90], IoT Network Intrusion [91], MQTT-IoT-IDS2020 [92], MQTTset [93], and IoT-23 [94]. CICFlowmeter [95] extracts 79 network features from pcap files and stores them as CSV files. The adapted datasets can be accessed from [96]. Our previous work [83] describes the testbed setup and the attacks performed on each dataset in detail. It is determined how to label each dataset instance based on criteria pre-defined criteria for each dataset instance. Because the network characteristics flow ID, source IP, source port, destination IP, and timestamp characterize communication inside a specific IoT network; they were removed from all adapted datasets. Non-numeric features are transformed into numeric features. Duplicate instances were produced when the pcap data was converted to CSV files. All datasets have had these redundant instances deleted. The column characteristics were normalized within a given range to eliminate extreme values and significantly speed up calculations (-1, 1). The mean imputation method is used to fill in missing values in datasets. It replaces missing values with the average of a variable's remaining values that have missing information. Table 6 presents the BoT-IoT, IoT Network Intrusion, IoT-23, MQTT-IoT-IDS2020, and MQTTset datasets with and without redundancy. The BoT-IoT dataset was generated using pcap files by Koroniotis et al. [90]. The BoT-IoT dataset testbed comprises virtual machines (VMs) connected to the network through a LAN and the Internet. The PFSense system establishes a connection between the VMs and the Internet. A realistic smart home framework was developed using five IoT devices operated locally and connected to the cloud services using the node-red system for producing network traffic. Normal network traffic is generated with the help of the ostinato tool. The Ubuntu server delivers IoT resources for simulating a real-world IoT network, while Kali Linux serves as an attack system. Transmitting IoT messages into the Cloud is accomplished using the MQTT protocol. The BoT-IoT dataset category classes are Normal, DDoS, DoS, Scan, and Theft. The BoT-IoT dataset instances details are presented in Table 6(a). We combined DDoS and DoS categories to make it one attack class.
The IoT Network Intrusion detection dataset was generated by Kang et al. [91]. Two IoT devices, SKT NGU and EZVIZ Wi-Fi cameras, were used as victim devices. The wireless network adapter monitor mode collects the network traffic files. Except for the Mirai Botnet category, all cyberattacks are constructed of packets collected when modeling cyberattacks that use software such as Nmap. A laptop produced attack packets that were then altered to seem as though they came from an IoT device in the instance of Mirai Botnet. The IoT Network Intrusion detection dataset category classes are Normal, DoS, MITM, Mirai, and Scan. Table 6(b) shows the details of the IoT Network Intrusion detection dataset instances. Stratosphere Laboratory at CTU University in the Czech Republic generated the IoT-23 dataset using three real IoT devices [94]. The IoT-23 dataset comprises twenty-three instances of varied IoT network traffic generated by different IoT devices. The IoT-23 dataset provides researchers with a big, annotated dataset of real IoT network traffic to create machine learning models. The IoT-23 dataset category classes are Normal, Attack, Mirai, File Download, Heartbeat, C&C, Torii, Port Scan, DDoS, Okiru. Table 6(c) displays the details of the IoT-23 dataset instances. The MQTT-IoT-IDS2020 dataset was created by Hindy et al. [92]. Five recorded scenarios are included in the dataset: one of normal operation and four of attack. The dataset represents a real MQTT IoT network in a typical operating situation. The dataset covers popular MQTT attacks and situations for real-world testing devices. The MQTT-IoT-IDS2020 dataset classes are Normal, MQTT-Bruteforce, Scan-A, Scan-U, and Sparta.   0  Normal  --4253672  --2000000  1  DDoS  17420085  ----500000  2  DoS  -59391  ---59391  3  MITM ARP Spoofing  -32909  ---32909  4  Mirai  -366971  ---366971  5 MQTT Bruteforce  ---2001972  -500000  6  Sparta  ---1217198  -500000  7  Theft  6257  ----6257  8  Attack  --1699608  --500000  9  C&C  --20612  --20612  10 File Download  --7707  --7707  11  Heartbeat  --12648  --12648  12  Okiru  --12908506  --500000  13  OS Scan  35675  ----35675  14 Port Scan  --2999999  --500000  15  Torii  --24492  --24492  16 MQTT Flood  ----77793  77793  17  Malformed  ----3580  3580  18  SlowITe  ----3044  3044 Total 5651079 BoT-IoT, IoT Network Intrusion, IoT-23, MQTT, and MQTTset datasets were combined to make a dataset with many attacks. The new dataset is named IoT-DS2. Table 7 displays the details of the IoT-DS2 dataset instances. The IoT-DS2 dataset consists of eighteen attack classes and a normal class. The category column represents the name network traffic class which can be normal or any of the eighteen attack classes. The next five columns represent the dataset used in the development of the IoT-DS2 dataset. The final column represents the number of instances extracted from the BoT-IoT, IoT Network Intrusion, IoT-23, MQTT, and MQTTset datasets. These datasets, available at [96], may be used by researchers to develop and test IoT anomaly detection systems. The extracted datasets were split 80% for training and 20% for testing in the first phase, and then the training dataset was split into 80% for training and 20% for validation using a stratified way. Choosing features is a key stage in deep learning model building. Feature selection is a strategy for improving models that includes detecting and selecting just those features required for improved prediction. In addition to reducing overfitting, the feature selection technique speeds up model training and makes the model less susceptible to test inaccuracies. The recursive feature elimination technique selects 64 features using a random forest algorithm and IoT-DS2 dataset [83]. The feature selection technique was not used on the NSLKDD dataset. The same set of features was utilized in all models and across all datasets.

V. EVALUATION of RESULTS
The proposed LSTM or BiLSTM, or GRU models and convolutional neural network-based LSTM or BiLSTM, or GRU models are validated using accuracy, precision, recall, J 0 = $*K + *L% $*K + MK + *L + ML% $11% Seven datasets were used to conduct a multiclass and binary classification experiment using the proposed anomaly detection models. This paper combines the Keras framework with the TensorFlow backend to conduct all the experiments. The experiments were conducted using Google Colab. Three processes comprise a neural network model assessment: training, validation, and testing. The classification method favors the majority class if unequally distributed data sets are employed. We used class weights and SMOTE techniques to handle imbalanced classes in the datasets. First, we used class weight for classes in the training phase. Class weights were calculated based on the number of class instances, so the class with a small number of instances will get a high weight.
The Adam optimizer was used to train each RNN model for 100 epochs on the same batch size. The loss functions of a neural network are certainly the most important factor. The sparse categorical cross-entropy loss function is used in this paper. We utilized three techniques to reduce the overfitting of the model. First, the kernel, bias, and activity regularizers are used at LSTM, BiLSTM, GRU, or CNN layers. The kernel, bias, and activity regularizers use l1-l2 penalty techniques for regularization. Second, we used the activity regularization layer, and third, we used the dropout layer. These three techniques entirely reduce the chance of overfitting the model. Finally, an early stopping approach was used to terminate the model if the validation loss did not decrease over the training phase. The early stopping strategy also minimizes the likelihood of overfitting, which occurs when a model is trained over a large number of epochs. A total of 100 epochs were used to train all RNN models, with a batch size of 128 and patience of 5 iterations for each model. The batch size and the number of epochs were increased and decreased to check for improvement in the model's accuracy. Still, the accuracy of the model was not improved. At each epoch value, the accuracy and loss of each model were computed for both the training and validation sets. Fig. 11(a) illustrates the loss and Fig. 11(b) accuracy of LSTM, BiLSTM, and GRU models trained and validated on the BoT-IoT dataset. The loss and accuracy of the CNNbased LSTM, BiLSTM, and GRU models using the BoT-IoT dataset during training and validation are represented in Fig.  12(a) and 12(b). The loss function calculates the overall deviation across all tests in the training set. The early stopping strategy will stop the training process if the validation loss does not decrease after a specified number of iterations, hence reducing the over-fitting problem. As illustrated in Fig. 11 and 12, the loss function and the accuracy plot are inversely associated. With 200 and 500 epochs, as well as 10 iterations of patience, the accuracy did not increase. Overfitting of the training data occurs as a result of running a model over a significant number of iterations.
We used multiclass and binary class classification for seven datasets in this paper. Tables 8, 9, and 10 show the multiclass classification of LSTM, BiLSTM, and GRU models using NSLKDD, BoT-IoT, IoT-NI, MQTT, MQTTset, IoT-23, and IoT-DS2 datasets. The outcome of LSTM, BiLSTM, and GRU models using the NSLKDD dataset is presented in Table 8(a). The accuracy of the NSLKDD dataset was measured at 99.67% for LSTM, 99.82% for BiLSTM, and 99.78% for GRU models. Normal, DoS, and Probe classes achieved a high detection rate while the R2L and U2R detection rates were low. The R2L and U2R attack categories are very rare in the dataset, which is the main reason for the low detection rate of these attack categories. The BiLSTM model achieved a high detection rate among the three models using the NSLKDD dataset. The accuracy of the BoT-IoT dataset is higher than the NSLKDD dataset. Table 8(b) shows the results of LSTM, BiLSTM, and GRU models using the BoT-IoT dataset. Normal, DoS, Scan, and Theft classes of the BoT-IoT dataset achieved at least a 99.50% detection rate. The weighted average of FPR was 0.04%, and FNR was 0.11% for the BiLSTM model. Scan and Theft categories have a high rate of misclassification. The IoT-NI dataset multiclass classification outcomes are presented in Table 8(c). The accuracy of LSTM, BiLSTM, and GRU models using the IoT-NI dataset is 98.14%, 98.89%, and 98.42%, respectively. The detection rate for the Normal class was relatively high compared to the other classes in the dataset. The IoT-23 dataset consists of ten classes, and LSTM, BiLSTM, and GRU models results for the IoT-23 dataset are presented in Table 8 The outcome of LSTM, BiLSTM, and GRU models using the IoT-DS2 dataset are presented in Table 9. The IoT-DS2 dataset consists of 19 classes. The Accuracy of LSTM, BiLSTM, and GRU models using the IoT-DS2 dataset were 99.31%, 99.48%, and 99.32%, respectively. The BiLSTM model achieved a detection rate of 99.40%, which is high than the LSTM and GRU models. The FPR for the IoT-DS2 dataset using the BiLSTM model was 0.06%, while the FNR was 0.51%. The detection rate for the Normal class was at least 99% in all three models. MITM, C&C, Heartbeat, and Malformed Data attack categories have less than a 98% detection rate. Sensitivity, specificity, PPV, and NPV results for multiclass classification of the NSLKDD, BoT-IoT, IoT-NI, IoT-23, MQTT, and MQTTset datasets using LSTM, BiLSTM, and GRU models are presented in Table 10. A convolutional neural network requires significantly less preprocessing than other deep learning classification techniques, making it a more efficient classification approach. We provide detailed performance metrics for each kind of network anomaly and comparisons to other current deep learning-based systems to assess the CNN model's ability to identify different network attack characteristics [83]. The advantages of a convolutional neural network for anomaly detection issues are fully realized when the networks are applied to image recognition tasks. Using appropriate filters, a convolutional neural network can effectively capture spatial and temporal connectivity anomaly detection problems. A convolutional neural network is good for prediction because they learn input characteristics without missing important information. We used a convolutional 1D neural network in the first hidden layer to learn the network features. The convolutional 1D neural network layer is followed by two LSTM or BiLSTM or GRU layers hidden layers. The proposed convolutional 1D neural network layer LSTM, BiLSTM, and GRU are evaluated using NSLKDD, BoT-IoT, IoT-NI, IoT-23, MQTT, MQTTset, and IoT-DS2 datasets. The evaluation results are presented in Tables 11 and 12.   Table 13.  A lightweight binary classification model is proposed and developed based on a single recurrent neural network-based hidden layer. Three binary classification-based models were created using a single hidden layer from the LSTM, BiLSTM, and GRU networks. They were tested using NSLKDD, BoT-IoT, IoT-NI, IoT-23, MQTT, MQTTset, and IoT-DS2 datasets. First, we convert the whole dataset to binary label classification. Table 14 summarizes the assessment results for binary classification using the LSTM model. The MQTTset dataset archived a 99.96% detection rate. IoT-DS2, which combines all IoT datasets, reached a 99.42% detection rate using the LSTM model. The number of instances for the normal class was very small in the IoT-NI dataset compared to the anomaly class; as a result, the normal class achieved a low detection rate compared to the anomaly class. The findings of the binary classification evaluation using the BiLSTM model are summarized in Table 15. A detection rate of 99.98% was achieved with the MQTTset dataset using the BiLSTM model. IoT-DS2 has an average detection rate of 99.81%, much higher than the LSTM model detection rate. Table 16 summarizes the outcomes from the binary classification assessment using the GRU model. The GRU model performs better compared to the LSTM model. The GRU model's performance was lower than that of the BiLSTM model. The GRU model attained a detection rate of 99.91% when used with the MQTTset dataset. IoT-DS2 has an average detection rate of 99.70%, which is greater than the detection rate of the LSTM model but lower than the detection rate of the BiLSTM model. A lightweight binary classification model based on a single recurrent neural network-based hidden layer accurately detected normal and anomalous occurrences in the NSLKDD, BoT-IoT, IoT-NI, IoT-23, MQTT, MQTTset, and IoT-DS2 datasets.
Additionally, we compare the performance of binary classification using the receiver operating characteristic area under the curve (ROC AUC). We plot the ROC curves for the validation and testing sets for the NSLKDD, BoT-IoT, IoT-NI, IoT-23, MQTT, and MQTTset datasets. ROC curve for the validation set of NSLKDD, BoT-IoT, IoT-NI, IoT-23, MQTT, and MQTTset, datasets binary classification using BiLSTM model are presented in Fig. 13, and for the testing set are presented in Fig 14. Next, we expand the binary classification model to classify normal and individual anomalous classes of the IoT-DS2 dataset. The IoT-DS2 dataset was divided into eighteen subsets, with each subset consisting of a normal data class and an anomalous data class. Each subset was used to evaluate LSTM, BiLSTM, and GRU models. The accuracy, precision, recall, and F1 score of the IoT-DS2 dataset normal and individual anomalous classes using LSTM, BiLSTM, and GRU models are presented in Fig. 15. The BiLSTM model performs better compared to LSTM and GRU models. The binary classification model correctly identified normal and anomalous events in the IoT-DS2 dataset using a single hidden layer recurrent neural network. These evaluation results confirm that a single-layer recurrent neural networkbased model can detect anomalies in different IoT networks effectively.
To address imbalanced classes in the datasets, we first employed class weight in the training phase for classes. The evaluation results reported in the preceding section are based on class weighting. Class weights were calculated based on the number of instances of each class; therefore, a minority class with a small number of instances will receive a high weight. Next, we implement SMOTE for minority classes. SMOTE was implemented for NSLKDD, BoT-IoT, IoT-NI, IoT-23, MQTT, and MQTTset datasets. The Borderline SMOTE algorithm was used to generate new synthetic samples. The random state is used to control the algorithm's randomization. K is the number of neighbors used in the borderline SMOTE algorithm to calculate the average distance to minority samples. We tested a range of K values, but K=6 to 10 was determined as an optimal number of neighbors for calculating the average distance to minority samples. New synthetic samples were created using the borderline-1 algorithm. We used CNN-based LSTM, BiLSTM, and GRU models to evaluate the performance of borderline SMOTE. The borderline SMOTE is used exclusively to ensure that the training set is balanced. The multiclass classification using SMOTE for class balancing and CNNLSTM, CNNBiLSTM, and CNNGRU models for classification are presented in Table 15.     The NSLKDD dataset Prob, U2L, and R2L classes detection rates have been improved are shown in Table  15(a). We select one million instances randomly from the BoT-IoT dataset's DoS and Scan classes and then use borderline SMOTE to balance other classes of the BoT-IoT dataset. Table 15(b) shows the detection rate for the Normal and Theft classes has been enhanced. Four classes were balanced in the IoT-NI dataset. As presented in Table 15(c), the detection rates for the Normal, DoS, MITM, and Scan classes have been improved for the IoT-NI dataset. We select one million instances randomly from the IoT-23 dataset's Normal, Attack, Port Scan, DDoS, and Okiru classes. Mirai, File Download, Heartbeat, C&C, and Torii classes were balanced.
The IoT-23 dataset has improved the detection rate of minority classes, as seen in Table 15. (d). We randomly select one million MQTT-BF instances and one million Sparta instances from the MQTT dataset. The MQTT dataset achieved a high detection rate when class weight was used to handle imbalance classes in the MQTT dataset. Each model enhanced the overall detection rate in the MQTT dataset. All anomalous classes were in the minority in the MQTTset dataset. Class weight was employed to address imbalance classes in the MQTTset dataset, resulting in a high detection rate, but when border SMOTE was employed on the MQTT dataset, each model improved the overall detection rate. The border SMOTE technique performs better than class weight, but the border SMOTE technique requires more computing resources than class weight. Overfitting of the LSTM, BILSTM and GRU models was evaluated using drop layer and early stopping approaches. Furthermore, we perform a 5-fold cross-validation on the LSTM, BiLSTM, and GRU models to evaluate further overfitting. Through the use of 5fold cross-validation, identical results were obtained, demonstrating the consistency of the proposed model.

VI DISCUSSION AND COMPARISON OF RESULTS
The proposed model's results are compared to previous research papers in this section. We propose and implement binary and multiclass classification models based on LSTM, BiLSTM, GRU, and multiclass classification models based on CNNLSTM, CNNBiLSTM, and CNNGRU. These models were evaluated using NSLKDD, BoT-IoT, IoT-NI, IoT-23, MQTT, MQTTset, and IoT-DS2 datasets. Initially, we implement LSTM, BiLSTM, GRU based models consisting of 4 hidden layers. Our proposed models performed much better at detecting anomalies in IoT networks in binary and multiclass classification. We investigated the prospect of solving anomaly detection in IoT networks by deploying a recurrent neural network and integrating it with a convolutional neural network. The proposed model also uses kernel, bias, and activity regularizers at the recurrent layers. These regularizers apply penalties on the kernel, bias, and layer output. The activation layer used the LeakyReLU activation function. The pace of learning has been greatly enhanced when LeakyReLU is used. A model normalization layer may often aid in accelerating and stabilizing the learning process by decreasing the error rate in the model being learned. Layer normalization can stabilize the hidden state dynamics of the recurrent network. We employed the activity regularization layer and the dropout layer to decrease the possibility of overfitting. We propose and build a binary classification model based on a single recurrent neural network-based hidden layer. Table 18 compares binary classification techniques used in deep learning with proposed anomaly detection models. The detection rate was very high for IoT-23, MQTT, and MQTTset datasets. Given that these datasets are recently released datasets for the IoT network, there is not many research papers available in which authors constructed a deep learning model utilizing these datasets. Since the BoT-IoT dataset is the most referenced IoT dataset, we also used the BoT-IoT dataset to compare our proposed models to previously published IoT models. We also used the NSLKDD dataset to compare the evaluation results of our proposed binary classification model for anomaly detection in a generic network setting.
Anomaly detection models based on RNN were evaluated using the NSLKDD dataset by Yin et al. [38]. In their experiment, the model achieves a greater accuracy when there are 80 hidden nodes and a learning rate of 0.1. When assigned 100 epochs, RNN-IDS has a high detection rate of 83.28 %. Liu et al. [37] suggested the features of long and short sessions and developed a neural network based on CNN and LSTM models to extract the variations between normal and abnormal models. The results demonstrate that not only can the suggested quantitative model and enhanced algorithm effectively prevent camouflaging identity information, but they can also improve computing efficiency and the accuracy of small subset anomaly detection. Their model achieved high detection rate.  Advances in communication and information technology have made it possible to exchange an ever-increasing quantity of data over the internet, resulting in new uses for Internet services. Biswas et al. [57] suggested a method for distinguishing malicious Botnet traffic from legitimate traffic by using novel deep learning techniques such as ANN, GRU, and LSTM models. Testing reveals that the categorization accuracy is 99.76%, which is better than all prior research, as indicated by the author. The proposed LSTM, BiLSTM, and GRU models outperformed previously published anomaly detection models in terms of accuracy, precision, recall, and F1 score using the BoT-IoT dataset. We used a recurrent neural network to construct and implement two anomaly detection models for multiclass classification in IoT networks.
The LSTM, BiLSTM, and GRU models were built using the same structure. Fig. 8 represents a layered view of the proposed LSTM, BiLSTM, or GRU models. The first implementation is made up of four hidden layers. The hidden layers are either LSTM, BiLSTM, or GRU layers. Convolutional neural network advantages can become more apparent when anomaly detection difficulties are turned into image recognition problems. A convolutional neural network is very good at capturing spatial and temporal correlations that are important for intrusion detection. We then used a convolutional and recurrent neural network to construct and develop a model. In the second approach, three hidden layers are used to get the desired result. The first hidden layer uses a convolutional neural network to extract feature information and correlation from the input features. We utilized an average pooling layer to decrease the number of features by half. Next, there are two hidden layers of the recurrent neural network, and the hidden layers are composed of either LSTM, BiLSTM, or GRU layers.
The convolutional neural network layer replaces two recurrent neural network layers and reduces the number of features by half. The multiclass classification was performed in this paper to analyze data from the NSLKDD, BoT-IoT, IoT-NI, MQTT, MQTTset, IoT-23, and IoT-DS2 datasets. LSTM, BiLSTM, and GRU models using these datasets for multiclass classification are presented in Tables 8, 9, and 10, while CNN-based LSTM, BiLSTM, and GRU models using these datasets for multiclass classification are shown in Tables 11, 12, and 13. Table 19 compares the assessment results of our proposed multiclass classification models for anomaly detection in a generic network and IoT networks scenario using the NSLKDD and BoT-IoT datasets. Yin et al. [19], Wu et al. [25], Naseer et al. [26], Ding et al. [27], Chouhan et al. [34], Imrana et al. [9], Liu et al. [51], Sethi et al. [61], and Vinayakumar [35] used the NSLKDD dataset to evaluate their model, but these proposed techniques achieved very low accuracy, as shown in Table 19. The model proposed by Sahu et al. [100] and Moizuddin et al. [99] achieved reasonably high accuracy and detection rate compared to other model models. Xu et al. [24] investigate a multilayer perception and gated recurrent units-based intrusion-detection model utilizing the KDD99 and NSLKDD datasets. Their model achieved a maximum accuracy of 99.24% for the NSLKDD dataset.
Numerous issues emerge as a result of the fact that malicious cyberattacks are constantly evolving and happening in extremely high numbers, demanding a scalable solution. Due to malware's dynamic nature and ever-changing attack methodologies, publicly accessible malware datasets must be regularly updated and benchmarked. DNNs and other conventional machine learning classifiers have been tested on various publicly accessible malware datasets by Vinayakumar et al. [35]. Their proposed model attained a maximum accuracy of 92.90% using four hidden layers for the KDD99 dataset and 78.50% using five hidden layers for the NSLKDD dataset. The security of an Internet of Things network is fundamentally linked to the security of the supporting computer and communication infrastructure.
Ge et al. [44] propose an intrusion detection technique for IoT networks, classifying traffic flow using a deep learning technique. They use the BoT-IoT dataset to obtain generic features from packet data. Their model has a high level of accuracy, with a score of 98.09%. Recently, Ge et al. [47] also proposed a model based on a feed-forward neural network with embedding layers for encoding highdimensional categorical characteristics for multiclass classification. They also used transfer learning to encode high-dimensional category characteristics to construct a binary classifier based on another feed-forward neural network model. Their model reached a 99.79% accuracy rate. They used the source port in the feature set. All the attacks were generated from specific source ports. Due to this, there is a possibility that their model overfits the training data, resulting in high accuracy and detection rate. Anomaly detection methods such as the LSTM, BiLSTM, and GRU models were developed to outperform existing anomaly detection models in the usual criteria of accuracy, precision, recall, and F1score.
The classification algorithm is biased towards the majority class when an imbalanced dataset is used. We used oversampling and undersampling techniques to balance the datasets to solve this issue. When using SMOTE, synthetic samples are generated for the minority class; however, the methodologies used in SMOTE are based on local knowledge rather than generalized information about the minority class. When it comes to categorization, if certain classes dominate, it can result in biased results. As a result, it is suggested to balance the dataset before fitting. These models effectively capture the spatial and temporal connectivity of anomaly detection problems. IoT networks are comprised of a diverse range of applications and data types. The same model can be applied to a wide variety of IoT applications and data types. Some IoT networks produce a lot of data because it keeps running. The proposed model can handle a lot of data. Additionally, the proposed model achieves better performance results when dealing with large amounts of data. One constraint is that it requires a large amount of data to perform better than other strategies.

VII CONCLUSION AND FUTURE WORK
The design, development, and evaluation of recurrent neural network-based anomaly detection models for IoT networks are proposed in this paper. Two models for multiclass classification were proposed. Each model was implemented using LSTM, BiLSTM, and GRU recurrent neural networks. Seven datasets were used to evaluate the proposed models. The adaptability of multiclass classification models was demonstrated by their high detection rates across LSTM, BiLSTM, and GRU implementation environments. The proposed multiclass classification models were compared to current deep learning-based techniques, and it was observed that the proposed models outperformed them on essentially every assessment criteria. Finally, a lightweight anomaly detection model for binary classification was proposed and implemented utilizing the single hidden layer of LSTM, BiLSTM, and GRU recurrent neural network. The reliability of the proposed architecture for anomaly detection in IoT networks is shown by the multiclass and binary classification models performing consistently across many datasets.
In future work, we plan to investigate more deep learning approaches for anomaly detection in IoT networks, adopting various optimization techniques to boost the detection capability of these models on small datasets. We also plan to develop and evaluate ensemble-based techniques for LSTM, BiLSTM, and GRU models.