Edge-IIoTset: A New Comprehensive Realistic Cyber Security Dataset of IoT and IIoT Applications for Centralized and Federated Learning

In this paper, we propose a new comprehensive realistic cyber security dataset of IoT and IIoT applications, called Edge-IIoTset, which can be used by machine learning-based intrusion detection systems in two different modes, namely, centralized and federated learning. Specifically, the dataset has been generated using a purpose-built IoT/IIoT testbed with a large representative set of devices, sensors, protocols and cloud/edge configurations. The IoT data are generated from various IoT devices (more than 10 types) such as Low-cost digital sensors for sensing temperature and humidity, Ultrasonic sensor, Water level detection sensor, pH Sensor Meter, Soil Moisture sensor, Heart Rate Sensor, Flame Sensor, etc.). Furthermore, we identify and analyze fourteen attacks related to IoT and IIoT connectivity protocols, which are categorized into five threats, including, DoS/DDoS attacks, Information gathering, Man in the middle attacks, Injection attacks, and Malware attacks. In addition, we extract features obtained from different sources, including alerts, system resources, logs, network traffic, and propose new 61 features with high correlations from 1176 found features. After processing and analyzing the proposed realistic cyber security dataset, we provide a primary exploratory data analysis and evaluate the performance of machine learning approaches (i.e., traditional machine learning as well as deep learning) in both centralized and federated learning modes. The Edge-IIoTset dataset can be publicly accessed from [1].


I. INTRODUCTION
The Internet of Things (IoT) is a connected network of equipment that has the ability to communicate with each other and provide data to users via the Internet.The explosive growth of IoT in recent years is due in part to its broad applicability, scalability, and support for smart applications.The majority of IoT applications perform tasks in an automated fashion, with little or no interaction with humans.
Industrial IoT (IIoT) is a subclass of IoT, where IoT devices are used in typically closed industrial environments.IIoT has been successful in producing significant resource The associate editor coordinating the review of this manuscript and approving it for publication was Shafiullah Khan.savings, while increasing productivity [2].IIoT represents a critical enabler of Industry 4.0, often referred to as the next industrial revolution [3].Currently, there are more than 8 billion IoT-connected devices, and the number is expected to reach 41 billion by 2027 [4].In 2021, the global IoT market size was estimated to be above $380 billion and is expected to reach over $1.8 trillion by 2028, growing at a CAGR of 25.4% from 2021 to 2028 [5], with sectors such as automotive, smart home, manufacturing, energy, healthcare, transportation, logistics, and media being at the forefront of IoT evolution.
The enormous increase in IoT calls for appropriate security and privacy policies to prevent potential vulnerabilities and threats introduced by the implementation of this technology.VOLUME 10, 2022 This work is licensed under a Creative Commons Attribution 4.0 License.For more information, see https://creativecommons.org/licenses/by/4.0/Furthermore, other key considerations in IIoT, including trustworthiness, expandability, and energy usage, must be addressed, given that legacy security fixes are falling short in many cases [6].According to Kaspersky researchers, the number of cyberattacks against IoT devices jumped to 1.5 billion up from 639 million in one year period (2020-2021), which represents more than 100% increase, as cybercriminals have cleverly turned their attention to this space, seeking to rob data, mine cryptocurrencies, and create botnets [7].Another favorite weapon of hackers lately is the ransomware attack, as the average ransom amount paid by organizations jumped by 311% in 2020 and hit about $350 million in crypto-currency, according to a report released by the Ransomware Task Force [8].For instance, in the first half of 2021, DarkSide (a Russian-based hacker group) claimed responsibility for a ransomware attack on Colonial Pipeline, one of the largest fuel pipelines in the U.S., and forced it to have its SCADA systems down and pay nearly $5 million in a hard-to-trace crypto-currency.
Considerable work has been done by the cybersecurity community in creating sophisticated security tools and techniques for protecting users and data in traditional IT systems.Yet, these measures themselves cannot be immediately deployed for IoT/IIoT-based systems.Many existing techniques are insufficient to address novel threats that can breach IoT networks, making it necessary to delve deeper into advanced forensic approaches to detect and investigate malicious behavior [10].Purpose-built cybersecurity solutions, that are tailored to IoT and IIoT systems, are needed to manage the limitations such as constrained functionality, limited power, and lightweight network protocols [14], [15].One such solution are Intrusion Detection Systems (IDS), and their ability to provide detection and surveillance of attacks throughout their lifecycle, enabling a response to advanced persistent threats that can evade existing security measures.[12], [16].
Intrusion detection techniques that are based on machinelearning require training and ongoing callibration using centralized or federated learning approaches [17]- [21].A key success factor in training IDS is choosing the right dataset.For IoT/IIoT systems security, it is critical to use datasets that closely mirror real-world IoT/IIoT applications.The scarcity of available IoT/IIoT datasets presents a significant barrier to the evaluation of IDS solutions tailored for IoT/IIoT systems.This scarcity of data is mainly caused by privacy concerns.Therefore, a great number of major corporations that are building such datasets are discouraged from sharing it publicly [22].
The goal of our work presented in this paper is to provide a comprehensive dataset that can be used for developing and accurately validating IoT/IIoT security solutions.We propose a new IoT and IIoT dataset collected from a sophisticated seven-layer testbed including more than 10 IoT devices, IIoT-based Modbus flows, 14 IoT and IIoT protocol-related attacks.In addition, a detailed description of the dataset and its features is given in this paper.Furthermore, using the dataset, we have evaluated the performance of intrusion detection through several supervised machine learning methods using two different learning approaches, namely centralized and federated learning.The Edge-IIoTset dataset can be publicly accessed from [1].
Our research contributions are as follows: • We present a new platform for creating a new comprehensive realistic cybersecurity dataset of IoT and IIoT applications.The testbed is organized into seven layers: • We extract features obtained from different sources, including alerts, system resources, logs, network traffic, using two networks protocols analyzers, namely, the Zeek tool and TShark tool.Then, we propose new 61 features with high correlations from 1176 features found.
• We propose new processing and analyzing framework for our realistic cyber security dataset of IoT and IIoT applications, which is based on ten steps, including, 1) labeling for binary classification models, 2) labeling for multiclass classification models, 3) merging all CSV files, 4) applying the process of detecting and correcting, 5) dropping unnecessary flow features, 6) converting categorical variable, 7) splitting arrays or matrices into random train and test subsets, 8) encoding categorical features, 9) standardizing features, and 10) implementing the synthetic minority over-sampling technique.
• We provide a primary exploratory data analysis and evaluate the performance of machine learning approaches in both centralized and federated learning modes.
• We provide a complete review and analysis of the available existing datasets with Edge-IIoTset.The findings demonstrate the performance of our proposed platform in creating a new comprehensive realistic cyber security dataset of IoT and IIoT applications and the superiority of the Edge-IIoTset dataset in comparison to existing ones.The structure of the paper is organized as follows.
In Section II, we provide a complete review and analysis of the available existing datasets with Edge-IIoTset.Section III presents our proposed IoT and IIoT testbed architecture.Section IV provides the description of Edge-IIoTset dataset.In Section V, we provide the extrapolated features and describe their different types.Section VI presents the experimental results of the proposed Edge-IIoTset dataset.Finally, Section VII concludes this paper.

II. RELATED AVAILABLE IoT AND IIoT DATASETS FOR CYBER SECURITY
Various datasets have been proposed by the community for IoT/IIoT cybersecurity in recent years [14].This section presents a discussion about some of the most popular datasets that have been recently used for IoT/IIoT-based IDS developments.Tab. 1 provides a brief comparison between these datasets and ours.
A. MQTTset DATASET Created by Vaccari et al. [11] as a way to train ML-based IDSs in the IoT context.The specific objective of the MQTTset is the focus on the MQTT protocol and the threats associated with IoT devices that use it.The lab environment established by the authors for generating the dataset consists of eight sensors and an MQTT broker.The sensor types deployed in two rooms are temperature, humidity, motion, CO-Gas, door lock, fan, smoke, and light sensors.The collection period corresponds to a time window of one week, generating more than 11 million network packets, with a more than 1 GB data size.The MQTTset is comprised of both legitimate and malicious traffic.The version of MQTT is 3.1.1with the authentication disabled.The dataset is composed of 33 features, including three related to TCP and 30 related to MQTT.The malicious traffic was generated by launching attacks against the MQTT broker.The attack vector used in the dataset include flooding DoS using the MQTT-malaria tool, MQTT publish flood using the IoT-Flock tool, Slow DoS against Internet of Things Environments (SlowITe), malformed data attack using the MQTTSA tool, and brute force authentication by using MQTTSA also.For the validation of the proposed dataset in terms of intrusion detection, the authors considered the following algorithms: neural network, Decision Tree (DT), Random Forest (RF), Naïve Bayes (NB), Multilayer Perceptron (MP), and Gradient Boost (GB).For the multi-classification approach RF shower the best performance results with 99% accuracy and 99% F1-score, while MP showed the worst with 94% accuracy and 96% F1-score.However, the dataset only contains MQTT traffic, which means that there is no IIoT traffic such as Modbus protocol, which in turn makes this dataset not suitable for IIoT security applications.

B. N-BaIoT DATASET
Created by Meidan et al. [9] as a way to evaluate a proposed network-based anomaly detection scheme that retrieves behavioral snapshots out of the network, and leverages deep autoencoders for detecting abnormal network traffic originating from exploited IoT devices.The constructed lab environment is composed of nine IoT devices with the following types: doorbells, thermostats, baby monitors, security cameras, and a webcam.In addition to an access point, a sniffer host, and a C&C server.The total number of instances reported in the dataset is 7062606 from the nine IoT devices.The dataset contains a set of 23 features from five-time windows, consists of statistics of streams: weight, mean, std, radius, magnitude, cov, and pcc (approximated covariance between two streams).The normal traffic was captured right after the new installation, to ensure that no infected streams were injected.The malicious traffic consists of 10 attack types carried by 2 botnets namely BASHLITE and Mirai.The authors implemented an optimized deep autoencoder for the validation of the proposed method and dataset and conducted a comparison with three models, namely: Local Outlier Factor (LOF), One-Class SVM, and Isolation Forest (IF).For most devices, deep autoencoders have shown superiority in terms of TPR, FPR, and detection time, with a TPR of 100%, a mean FPR of 0.007 ± 0.01, and a time of 174 ± 212 ms.However, the dataset includes only malicious attacks from two botnets, with no IIoT traffic involved, making it impossible to detect other types of IoT attacks, such as MiTM, and not relevant to IIoT security applications.

C. BOT-IoT DATASET
Created by Koroniotis et al. [10] at the Research Cyber Range lab of UNSW Canberra using both real and simulated IoT network traffic, with the goal of detecting and identifying botnets on IoT-specific networks.The lab environment is constituted by three elements: a) Network services and platforms, including legitimate and malicious virtual machines (VMs), b) Simulated IoT-based smart-home, using the Node-red tool, and including traffic of simulated IoT devices, including thermostat, garage door, refrigerator, weather monitoring system, and lights, and c) Forensics analytics, using the Argus tool.The dataset consists of over 72 million records, with a size of 69.3 GB for the captured PCAP files, and 16.7 GB CSV for the extracted flow traffic.The protocols used in the dataset include TCP, UDP, ARP, ICMP, IGMP, and RARP.The reported features are of two types: real protocol parameters and generated flow features.The malicious traffic was generated using cyber-attacks originating from Kali Linux VMs, and including probing (port scanning and OS fingerprinting), DoS/DDoS, and information theft (data theft and keylogging).The dataset was tested under three ML and DL models, namely Recurrent Neural Network (RNN), Support Vector Machine (SVM), and Long-Short Term Memory (LSTM).SVM showed the best accuracy performance with 99%.However, there is only IoT data in the dataset, so there's no IIoT traffic, making it unsuitable for IIoT security.

D. FEDERATED TON_IoT DATASET
Created by Moustafa et al. [3] at the IoT lab of UNSW Canberra, by including federated data sources collected from three dataset types: a) IoT services telemetry, b) Operating systems, and c) Network traffic.The testbed is layered into three levels: a) Edge Layer: houses IoT and networking appliances, b) Fog Layer: houses VMs and gateways, and c) Cloud Layer: consists of services, like data analytic.The dataset is composed of both normal and attacks traffic.The Windows 7 dataset contains 10000/5980 normal/attack records, while the Windows 10 dataset contains 10000/11104 normal/attack records.The dataset includes nine attack categories, namely DoS/DDoS, scanning, ransomware, backdoor, injection, XSS, password, and Man-In-The-Middle (MITM) attacks.The authors reported the correlation analysis of the selected features.The correlation matrix was adjusted to pick the most correlated features with a threshold value greater than or equal to 0.85.However, the dataset does not contain IIoT traffic, nor does it provide intrusion assessment using different machine learning techniques with the proposed dataset to validate it.The dataset was evaluated using the following models: DT, NB, SVM, K-nearest Neighbor (KNN), Logistic Regression (LR), Deep Neural Network (DNN), and Gated Recurrent Unit (GRU).DT obtains the highest performance of all algorithms, with 99.54% accuracy for binary classification, and 99.49% for multi-classification.However, the dataset only utilizes centralized learning approaches for providing intrusion detection evaluations with the proposed dataset to validate it.Using federated learning is essential in different situations within IoT/IIoT environments to address privacy, network, and storage issues [17].

F. WUSTL-IIOT-2021 DATASET
Created by Zolanvari et al. [2], as a cybersecurity-targeted network-driven dataset of IIoT applications, by modeling and emulating actual industrial systems in the real world.the architecture implemented includes various IIoT sensors and actuators, HMI, PLC, logger, and alarming device, for simulating real-life industrial applications.The data set consists of 2.7 GB of data, collected in approximately 53 hours, with a total of 1,194,464 observations, including 1,107,448 for benign samples and 87,016 for malicious samples.The dataset contains 41 features selected based on the variation of their values during the attack phases.The attacks used in the testbed include command injection, DoS, reconnaissance, and backdoor.The model used to evaluate the generated dataset are LR, KNN, SVM, NB, RF, DT, and ANN.The RF model scored the best accuracy with 99.99%, and NB showed the least accuracy with 97.48% for binary classification.However, the dataset only contains data from an IIoT architected environment, with no traffic, data, or attacks from IoT-based devices.Therefore, this dataset is not suitable for evaluating IoT-based IDSs.

G. OTHER DATASETS
EMNIST dataset [23] is considered as a standard benchmark for AI-based computer vision systems.The dataset consists of handwritten character digits derived from the special NIST 19 database.Federated EMNIST dataset (FEMNIST) is the federated version of EMNIST, which partitions the dataset into individual clients with each client being assigned a corresponding number/character set of records in EMNIST.However, these datasets do not contain IoT or IIoT network traffic, so IoT/IIoT-based based IDS cannot be trained on them.

III. PROPOSED IoT AND IIoT TESTBED ARCHITECTURE
Given the limited number of IoT/IIoT datasets available for the cyber security sector, in which researchers typically rely on proprietary or open-source datasets that are not field-specific.In our work, we designed a realistic testbed that closely mirrors a real-world IoT/IIoT environment, and conducted realistic cyberattacks against it, to acquire realworld datasets with both legitimate and malicious traffic.The testbed consists of seven interconnected layers, namely: cloud computing layer, NFV layer, Blockchain layer, fog layer, SDN layer, edge layer, and IoT/IIoT perception layer, as shown in Fig. 1.Tab. 2 provides a list of the equipment and associated operating systems used for creating our dataset.We used open-source software to build our testbed as presented in Tab. 3 so that it can be easily re-used and validated by the research community.This section provides a detailed description of each layer.

A. CLOUD COMPUTING LAYER
This layer is not physically deployed in the lab, however, it acts as a provider of various services and resources such as IoT platforms, data storage, and computing power over the Internet.Cloud-based data storage, processing, visualization, and device management are mandatory operations for almost all IoT/IIoT-based applications.We have used the ThingsBoard IoT platform [24], since it supports a variety of IoT protocols, including MQTT, CoAP, and HTTP, for device connectivity.The platform also supports the capability of creating rich custom dashboards for real-time data visualization and remote device control, which is relevant to most IoT use cases.Every access to this layer will be via the Internet, as opposed to the other layers, where access is done locally through wireless routers.

B. NETWORK FUNCTIONS VIRTUALIZATION LAYER
NFV abstracts network functions to reduce overall costs and speed service deployment by separating network functions from their dedicated equipment by deploying them on virtual servers.This brings significant advantages, including savings in power usage, lower equipment and maintenance expenses, smoother upgrades, and better assets lifestyles.OPNFV is an industry-supported open-source NFV Infrastructure (NFVI) platform [25], that allows builds to be rolled out and tested on a range of different hardware settings.OPNFV combines various components, such as OpenStack, Kubernetes, and OpenDaylight, to create an end-to-end platform for computing, storage, and networking virtualization.Vulnerable services and applications are deployed in the layer, including Damn Vulnerable Web Application (DVWA).Attacks against these services and applications are discussed in detail in the following sections.

C. BLOCKCHAIN NETWORK LAYER
The applications of blockchain extend significantly beyond the realm of crypto-currencies, through its potential to create more transparency and equity while saving companies time and money.In an effort to build a sophisticated real-world testbed, we've included an enterprise-level blockchain platform called Hyperledger Sawtooth [26], which enables both distributed applications and ledger networks.In addition, Sawtooth also offers a high degree of modularity, allowing companies to make the most appropriate strategic decisions and let applications choose the appropriate consensus, access, and transaction protocols that suit the customer's particular needs.The framework supports making design decisions within the transaction processor, permitting several types of VOLUME 10, 2022 TABLE 4. IoT sensors and actuators adopted in the creation of Edge-IIoTset dataset.
applications (IoT and IIoT applications) to operate within a single blockchain network instance.Individual applications can set up custom transaction processors tailored to their specific business requirements.

D. FOG COMPUTING LAYER
This layer acts as a mediator between the edge and the cloud layers for various purposes, including determining the relevance of data from the edge for relieving pressure on the network and the cloud by selecting the most important data.ThingsBoard is used as an IoT fog platform since it supports fog deployment, and it will also be responsible for synchronizing the data with the cloud instance.We have also deployed a digital twin for our testbed using Eclipse Ditto [27], in order to create a virtual model designed to accurately reflect the implementation of our cyber-physical testbed in the real world.Since our testbed is equipped with various sensors and actuators, it produces data related to many aspects of real-world physical object performance, such as temperature, PH, light, etc.Once the data is generated, it is transmitted to a virtual model of the physical object, which is then fed into a processing system that updates the digital copy.

E. SOFTWARE-DEFINED NETWORKING LAYER
This layer employs SDN technology, which is a sophisticated network management concept that enables dynamically efficient, programmatic configuration across the network to improve network performance and control.SDN is designed to overcome conventional networks by localizing the network logic into a single component and separating the transmission of packets from the routing operation.We used the ONOS SDN controller [28] for this layer.ONOS is a flexible, scalable, distributed SDN controller that makes it easy to administer, deploy, and set up new network components, such as network applications.It also supports real-time network control and configuration with user-friendly programmatic interfaces.

F. EDGE COMPUTING LAYER
Instead of having everything exported to the cloud for processing and analysis, this layer is positioned much closer to the data sources, by bringing the calculation features to the edge of the network, and handling IoT/IIoT data far away from the cloud nodes, near the edge of the network.By doing so, it allows data to be properly prioritized locally, thereby minimizing traffic flow on its way back to the cloud, making the Fog layer less complex with fewer possible points of failure, reducing bandwidth and cloud resources, and optimizing network latency.To accomplish this, specifically for IoT data, we installed various Mosquitto MQTT brokers [29]-an open-source message broker that implements the MQTT protocol -on several Raspberry Pi boards.In the case of IIoT data, we used Node-RED Modbus TCP [30], a Modbus master/slave creation tool intended to assist Modbus slave device builders to test and simulate the Modbus protocol.

G. IoT AND IIoT PERCEPTION LAYER
The perception layer or physical layer is equipped with a range of sensors that detect and gather environmental information, including the detection of specific physical parameters and/or the recognition of other types of information in the environment.It also includes actuators that act on the environment when certain conditions are met.Modbus slaves also belong to this layer and they receive requests from the master and send back replies.Tab. 4 provides a detailed description of each and every IoT sensor and actuator used in the testbed.The table includes the type of devices used, a brief description of the operation the device is to perform, the different application modes in which the device can be deployed, the product reference number, the features and specifications of the device such as the voltage, and the pin configuration used with the Arduino Uno board.

H. EXTERNAL ENTITIES
While the layers discussed above represent a sophisticated IoT/IIoT-based system, in this part the components represent the entities that interact with the system either with good or bad intentions.Specifically, we consider two entities: the service subscribers and the attacking machines.

1) SERVICES SUBSCRIBERS
These are the devices that subscribe to telemetry (IoT) and IIoT data from the various services deployed in the system.We consider smart TV, smartphone, and desktop computer usage.When such a device has made a subscription to a specific type of data, say for example the ultrasonic sensor located in room 1, whenever a change occurs, the IoT platform receives the change and notifies the subscriber in question with the change in real-time.

2) ATTACKING MACHINES
These entities are the malicious traffic generators for our dataset, as they use various attack software, tools, and scripts that are installed on these entities.A complete list of the attacks and their techniques is presented in the next section.

IV. DESCRIPTION OF EDGE-IIoTset DATASET
In this section, we thoroughly explore the various steps we took to generate our dataset [1].We initially provide a discussion of the proposed generation framework, followed by a description of malicious traffic management using multiple attacks, approaches, and tools.

A. METHODOLOGY OF CREATING THE EDGE-IIoTset DATASET
As presented in Fig. 2, the methodology of creating the Edge-IIoTset Dataset is organized in the following sevens steps:

1) SETUP AND CONFIGURATION OF NETWORK EQUIPMENT
We started with the installation of the software and hardware equipment, which are presented in tables 2, 3, and 4.More specifically, we configured these tools for each corresponding layer, including, cloud computing layer, NFV layer, Blockchain layer, fog layer, SDN layer, edge layer, and IoT/IIoT perception layer.

2) THREAT AND ATTACK MODELING
This step consists of modeling the attacks and threats against the IoT and IIoT applications.More accurately, we identified and analyzed fourteen attacks as presented 5, which are categorized into five threats, including, DoS/DDoS attacks, Information gathering, Man in the middle attacks, Injection attacks, and Malware attacks.The DoS/DDoS attacks make the victim's IoT edge server unavailable to legitimate requests by sending manipulated packets, which include four attacks, namely, TCP SYN Flood DDoS attack, UDP flood DDoS attack, HTTP flood DDoS attack, and ICMP flood DDoS attack.The Information gathering consists of analyzing IoT data packets to spot the weakness of IoT devices as well as Edge servers, which include three attacks, namely, Port Scanning, OS Fingerprinting, and Vulnerability scanning attack.The man in the middle attacks consists of the interception of communications between IoT devices and edge servers, which include two attacks, namely, ARP Spoofing attack and DNS Spoofing attack.The injection attacks consist of sending a malicious script to an unsuspecting user, which can access sensitive information, session tokens, cookies, etc.Finally, the malware attacks consist of installing backdoors to take control of vulnerable IoT network components, which include three attacks, namely, Backdoor attack, Password cracking attack, and Ransomware attack.Tab. 5 provides the list of attack scenarios included in Edge-IIoTset dataset.

3) NORMAL AND ATTACK IoT DATA GENERATION
In this phase, we generated IoT data from different components (i.e., IoT devices, Edge servers, SDN controller, Mosquitto MQTT brokers, etc.), as well as we launched the attacks against these components.The time period for normal data generation was started on November 21, 2021, running for several hours each day and ending on January 10, 2022 (not continuous).Moreover, the generated attack data experiments were performed at different hours and days from November 21, 2021, to January 10, 2022, where each attack experiment was conducted multiple times to generate more records.

4) NORMAL AND ATTACK IoT DATA COLLECTION
This phase consists of capturing packet data from the IoT network using the Wireshark tool and storing it in a PCAP file format.Tab.6 presents statistics of normal instances included in Edge-IIoTset dataset with PCAP files size.Traffic capture is done on a short-term period, a maximum of 3 hours of continuous collection.We configure the tool to collect all PCAP files from all the edge server interfaces (i.e., Raspberry Pi 4 Model B).

5) FEATURE EXTRACTION
This phase is focused on extracting the features from PCAP using two networks protocols analyzers, namely, the Zeek tool and TShark tool, and then storing it in a CSV file format for further processing.We identified and selected 61 features with a high correlation from 1176 found features.

6) DATASET PROCESSING AND ANALYZING
This phase is focused on processing and analyzing the Edge-IIoTset dataset.Specifically, we applied the following steps: • Step 1: We added a new label, named Attack_label, in order to label all records, whether normal or attack.The Attack_label contains 0 or 1, which is used for the binary classification model (i.e., 0 indicates normal and 1 indicates attacks).
• Step 2: We added a new label, named Attack_type, which presents the attack categories, for the multiclass classification model.
• Step 3: We merged all CSV files into one CSV file.
• Step 4: We applied the process of detecting and correcting (or removing) corrupt or inaccurate records from the Edge-IIoTset dataset.Specifically, we removed duplicates and missing values such as NAN (Not A Number) or 'INF' (Infinite Value).
• Step 5: We removed unnecessary flow features such as IP addresses, ports, timestamp and payload information.
• Step 6: We applied the pandas.get_dummiespackage for converting categorical variable into dummy/indicator variables.
• Step 7: We used train_test_split from the sklearn.model_selection package for split arrays or matrices into random train and test subsets.
• Step 8: We used OneHotEncoder from the sklearn.preprocessing package for encoding categorical features as a one-hot numeric array.
• Step 9: We applied StandardScaler from the sklearn.preprocessing package for standardizing features by removing the mean and scaling to unit variance.
• Step 10: We applied the SMOTE class from the imblearn.over_samplingfor the implementation of SMOTE -Synthetic Minority Over-sampling Technique.

7) DATASET PERFORMANCE EVALUATING
This phase is particularly focused on evaluating the performance of machine learning approaches in both centralized and federated learning modes.More particularly, we used the following machine learning approaches: RandomForest, Support Vector Machine (SVM), Decision Tree (DT), XGBoosT, as well as the most popular Deep Neural Network (DNN).

B. ATTACKS IN EDGE-IIoTset DATASET
The quality and diversity of legitimate entries in a dataset are critical for building the normal behavioral profile of a system.Additionally, malicious entries are essential for security solutions to recognize not only the precise attack patterns but also to identify new ones.

1) DoS/DDoS ATTACKS
In these attack categories, the attackers tend to deny the services from legitimate users, either solely or in a distributed fashion.We consider four of the most commonly used techniques, namely: TCP SYN Flood, UDP flood, HTTP flood, and ICMP flood.
• TCP SYN Flood DDoS attack: This is a version of a distributed denial of service (DDoS) attack that takes the exploitation of a normal three-way TCP handshake to use energy on the affected server and disable it completely.With SYN flood DDoS, the attacker essentially forwards requests for TCP connections more quickly in order to process them than the targeted machine can handle, which causes saturation of the IoT network.Once an IoT device and an Edge server have established a regular TCP ''three-way handshake,'' the IoT device initiates the process of requesting the connection by sending an SYN (synchronization) message to the Edge server.The Edge server then acknowledges by returning an SYN-ACK (synchronization-acknowledgment) message to the IoT device.

2) INFORMATION GATHERING
Obtaining intelligence about the targeted victim is always the first step in any successful attack.In our work, we consider three important steps that malicious actors generally do as a part of the information gathering stage, namely port scanning, OS fingerprinting, and vulnerability scanning.
• Port Scanning: The ports of IoT devices connected to a network are automatically scanned.The purpose is to discover which ports are open, closed or which of them have a security protocol.According to this analysis, intruders can obtain the composition of a network's architecture, the operating system, active security devices like firewalls, etc.This attack provides an easy access point for cyber-attackers.Once they manage to penetrate a network via port scanning, they will be able to extract sensitive information such as personal data, access to passwords, etc.The offensive systems with the following IP address: 192.168.0.170, were used to discover active hosts using the Nmap and Netcat tools.
• OS Fingerprinting: Once an attacker can identify the operating system (OS) type of a targeted device, he can then attack the vulnerabilities contained in that operating platform.Operating system fingerprinting is used by both attackers and security professionals to effectively and efficiently map remote networks, and to identify exploitable vulnerabilities.In addition, this attack operates only for packages with a TCP connection that has an ACK, SYN/ACK, and SYN.The offensive systems with the following IP address: 192.168.0.170 were used to apply an active operating system fingerprinting tool, named xprobe2.

3) MAN IN THE MIDDLE ATTACKS
This attack is intended to compromise and alter the flow of communication between two sides who assume to be in direct communication with each other.we focus on using this attack by targeting a couple of the most commonly used protocols in almost every system today, DNS and ARP.
• DNS Spoofing attack: The attacker uses the weaknesses of the DNS (Domain Name System) protocol and/or its implementation through the domain name servers.There are two main DNS Spoofing attacks: DNS ID Spoofing and DNS Cache Poisoning.Specifically, the attacker's objective is to associate the IP address of a machine under his control with a real and valid name of a public machine.When an IoT device wants to communicate with the edge server, the IoT device needs the IP address of the edge server.However, the IoT device may only have the name of the edge server.In this case, the IoT device will use the DNS protocol to obtain the IP address of the edge server from its name.The DNS ID Spoofing attack consists of capturing the ID number (i.e., when a DNS request is sent to a DNS server) in order to send a forged response before the DNS server and this by sniffing when the attack is performed on the same physical network.Since the DNS servers have a cache that keeps the correspondence between an IoT device name and its IP address for a certain time, the DNS Cache Poisoning consists of corrupting this cache with false information.

4) INJECTION ATTACKS
These attacks aim at compromising the integrity and confidentiality of the targeted system.We used three different approaches, namely XSS, SQL injection, and uploading attacks.
• Cross-site Scripting (XSS) attack: This is a type of security vulnerability of websites.Specifically, malicious scripts are injected into websites in order to attack users' systems.These scripts are created in scripting languages (e.g.JavaScript), which are run in the Internet browser.
The potential threat of cross-site scripting is the possibility of uploading user data to the browser without any verification.

5) MALWARE ATTACKS
These kinds of attacks are the ones that have gone publicly viral in the last few years, not just because of the extensive damage they have caused, but also because of the reported losses involved.We used three types of such attacks, namely backdoor, password crackers, and ransomware attacks.
• Backdoor attack: This malicious software is used to provide attackers with unauthorized remote access to an infected IoT device by exploiting vulnerabilities in the system.An attacker can use the backdoor attack to sniff a user, manage his or her files, attack other hosts, install additional software or malware, as well as monitor the whole system.The offensive systems with the following IP address: 192.168.0.170 were used to create a python script backdoor using the Metasploit framework and then transferring it using the curl tool.
• Password cracking attack: This attack consists of trying to find a password or a key through successive attempts.This means that the password is broken by trying successive combinations until the right one is found.This can range from alphanumeric attempts: a, aa aaa, ab, abb, abbb, etc., or from a dictionary of the most commonly used passwords.The offensive systems with the following IP address: 192.168.0.170 were used to lunch this attack.The CeWL tool is used as a Ruby app for creating a list of words (password crackers) and email addresses (usernames).
• Ransomware attack: This is a type of malware that takes hostage files or IoT devices.Specifically, the attacker demands a ransom in exchange for restoring access or decrypting files.The cybercriminals behind this attack will contact the victim with their demands, promising to unlock the IoT device or decrypt the files once pay a ransom, which is usually in Bitcoin.The offensive systems with the following IP address: 192.168.0.170 were used to lunch this attack.After applying the Backdoor attack, the OpenSSL cryptography toolkit is used for creating RSA public/private keys and encrypting and decrypting victim files.

C. THE DIRECTORIES OF THE EDGE-IIoTset DATASET
As published in [1], the directories of the Edge-IIoTset datasets contain 49 files, which are organized into three sub-directories as follows:

1) NORMAL TRAFFIC OF IoT AND IIoT APPLICATIONS
This subdirectory is named Normal traffic, which contains the following 10 files.

2) ATTACK TRAFFIC OF IoT AND IIoT APPLICATIONS
This subdirectory is named Attack traffic, which contains the following 28 files, including 14 CSV files and 14 PCAP files.
This directory contains two CSV files namely, DNN-EdgeIIoT-dataset.csv and ML-EdgeIIoT-dataset.csv.The DNN-EdgeIIoT-dataset.csv contains a selected dataset for the use of evaluating deep learning-based intrusion detection systems.The ML-EdgeIIoT-dataset.csv contains a selected dataset for the use of evaluating traditional machine learning-based intrusion detection systems.

V. EXTRAPOLATED FEATURES
To extract flow features from the network packets (i.e., PCAP files), we have analyzed different sources, including alerts, system resources, logs, and network traffic.We indicate that an authentic labeling operation was performed to label all records, whether normal or attack.Specifically, we have added two new attributes, namely, Attack_label and Attack_type.The Attack_label contains 0 or 1, which is used for the binary classification model (i.e., 0 indicates normal and 1 indicates attacks).The Attack_type presents the attack categories, that are used for the multiclass classification model (i.e., a classification task with more than two classes).

A. INTERNET PROTOCOL VERSION 4 (IP)
The Internet Protocol delivers the transport feature of the network layer (layer 3), which is deployed to transmit data packets from one IP address to other addresses.The end-user of the network layer will provide a remote IP address with a packet, which IP is required to forward the packet to that particular host.Based on the network protocol analyzer tool, we have found 138 attributes (e.g., Source or Destination Address, Destination Address, Destination Host, Timestamp, Transmission Control Code, IPv4 Fragment, Version, etc.).We have identified and selected two features with high correlation, including, ip.src_host and ip.dst_host.

B. ADDRESS RESOLUTION PROTOCOL (ARP)
The address resolution protocol is employed to determine the address assignment between a Layer 3 (protocol) address and a Layer 2 (hardware) address in a dynamic manner.
Based on the network protocol analyzer tool, we have found 51 attributes (e.g., Target hardware address, Target protocol address, Target IP address, Hardware size, Hardware type, Sender MAC address, Sender protocol size, Sender protocol address, Sender IP address, etc.).We have identified and selected four features with high correlation, including, arp.dst.proto_ipv4,arp.opcode, arp.hw.size, and arp.src.proto_ipv4.

C. INTERNET CONTROL MESSAGE PROTOCOL (ICMP)
This protocol is used by IP to transfer control messages between IP hosts.Based on the network protocol analyzer tool, we have found 107 attributes (e.g., Address entry size, Address Mask, Checksum, Timestamp from ICMP data, ICMP Extensions, Sequence Number, Address Family Identifier, Interface Index, Name Length, Length of the original datagram, UDP tunneling, Gateway Address, Request frame, Response time, etc.).We have identified and selected four features with high correlation, including, icmp.checksum,icmp.seq_le,icmp.transmit_timestamp, and icmp.unused.

E. TRANSMISSION CONTROL PROTOCOL (TCP)
The TCP protocol offers a connection-oriented data transfer based on the flow of data.Based on the network protocol analyzer tool, we have found 267 attributes (e.g., Acknowledgment Number, SEQ/ACK analysis, TCP Analysis Flags, TCP window update, Checksum, Proxy-Authenticate, Conversation completeness, Connection finish (FIN), TCP segment data, TCP Flags, MD5 digest, Multipath TCP Data ACK, etc.).We have identified fifteen features with high correlation, including, tcp.ack, tcp.ack_raw, tcp.checksum, tcp.connection.fin,tcp.connection.rst,tcp.connection.syn,tcp.connection.synack,tcp.dstport, tcp.flags, tcp.flags.ack,tcp.len, tcp.options, tcp.payload, tcp.seq, and tcp.srcport.The UDP layer offers transport layer (layer 4) functionality based on connectionless datagrams.Based on the network protocol analyzer tool, we have found 30 attributes (e.g., Checksum, Bad checksum, Destination Port, Length, Payload, Source or Destination Port, Destination process ID, Source process ID, Source Port, Stream index, Location, PDU Size, etc.).We have identified three features with high correlation, including, udp.port, udp.stream, and udp.time_delta.

I. MODBUS/TCP (MBTCP)
The Modbus/TCP protocol is commonly adopted in IIoT as a local interface to manage IIoT devices, which is the Modbus RTU protocol with a TCP interface.This protocol uses a client/server architecture (i.e., runs on Ethernet).Based on the network protocol analyzer tool, we have found 65 attributes (e.g., Length, Data, diagnostic code, Broadcast Received, Character Overrun, Communication Error, Slave Abort Exception Sent, status, Number of Objects, Read Device ID, Protocol Identifier, Transaction Identifier, function code, etc.).We have identified three features with high correlation, including, mbtcp.len,mbtcp.trans_id,and mbtcp.unit_id.

VI. THE PERFORMANCE EVALUATION
This section discusses the experimental results of the proposed Edge IIoT dataset, using centralized and federated deep learning-based intrusion detection with common evaluation metrics.Fig. 3 illustrates the main difference between these two learning approaches.Tab. 8 presents the notations list used within the proposed algorithms.
Firstly, we have used four conventional machine learning algorithms namely, Decision Tree (DT), Random Forest (RF), Support Vector Machine (SVM), K-Nearest Neighbor (KNN), as well as the most popular Deep Neural Network (DNN) for cyber-attack detection to evaluate models accuracy, using both centralized and federated learning against the proposed large-scale and heterogeneous dataset.Tab. 9 shows the different parameters applied for the implemented deep learning classifiers.Therefore, as familiar with machine learning workflow, we start with preparing data and cleaning our data from duplicates and missing values such as NAN (Not A Number) or 'INF' (Infinite Value).Then, we drop unnecessary flow features such as IP addresses, ports, timestamp, and payload information (i.e frame.time,ip.src_host, ip.dst_host, arp.src.proto_ipv4,arp.dst.proto_ipv4,http.file_data,http.request.full_uri,icmp.transmit_timestamp, http.request.uri.query,tcp.options, tcp.payload, tcp.srcport, tcp.dstport, udp.port, mqtt.msg).After that, we perform label-encoding by mapping the remaining categorical features (non-numeric) to numeric values.We apply feature scaling using a standardization algorithm.We split the data to produce Train sets for training and validating and Test sets for the final model evaluation.The statistics of normal and attacks involved in the dataset are described in Tab.10.
Tab. 11 illustrates the randomly selected subsets data for ML algorithms and the resulting Train and Test sets after data cleaning and splitting.For the DNN we have selected a greater portion of data for more accuracy.We also used SMOTE for oversampling minority classes (MITM, Fingerprinting) to enhance the overall model efficiency.
We conducted three experiments using Binary, 6-class, and 15-class classification to better study both traffic predictability and detection efficiency of various cyber-attacks and threat models.Furthermore, we studied centralized and federated learning to evaluate detection accuracy when considering privacy, heterogeneity, and the availability of data issues.
The availability of cloud solutions to overcome the shortcomings of resource limitation, centralized learning characterized by the availability of rich data promotes higher detection capabilities against complicated and large-scale attack patterns.Thus, we have studied various centralized detection approaches using google Colab resources.
For machine learning algorithms, we implement a workflow pipeline composed of: features selection using Random Forest, model initialization, and hyper-parameter tuning using Grid Search with stratified cross-validation technique to finally obtain a generalized and more efficient model.The algorithms 1, 2, 3, and 4 are used in the performance evaluation of the Edge-IIoTset dataset, for 1) dataset processing and analyzing, 2) centralized learning approach, 3) federated learning (FedAvg) approach [31], 4) feature selection method, respectively.The resulted models were then evaluated using the Test data and considering the following detection metrics: • Accuracy: is used to determine the proportion of correct classifications to the total number of entries, which is given by : • Precision: denotes the proportion of correct attack classes to the total amount of predicted attack results, which can be given by : • Recall: denotes the proportion of proper attack classifications relative to the overall count of all samples that ought to have been identified as attacks, it is given by : • F 1 -Score: reports the Harmonic Mean between Precision and Recall, which is given by:   classification (2-Class), the highest accuracy was obtained using three classifiers namely, RF, SVM, KNN, DNN which achieved 99.99%, while the DT classifier obtained an accuracy of 99.98%.These obtained prove that the deep learning approach is efficient for intrusion detection compared to traditional machine learning techniques (DT, RF, SVM, KNN) in centralized mode, especially with big data availability.
Tab. 12 provides a summary of DNN model learning for attack detection (Binary classification).We implemented a shallow DNN with only 47 trainable parameters.The model converges quickly achieving higher detection accuracy of 99.99%.As depicted in the classification report (Fig. 4(a)).The normal class pattern was well discriminated from all the attack patterns, due to the nature of IIoT physical objects that are typically task-oriented and maintain relatively stable data distribution which enhances both the effectiveness and the efficiency of attack detection with real-time capabilities.Tab. 13 provides the obtained centralized model results of machine learning techniques (DT, RF, SVM, KNN, DNN) in terms of F1-score, Recall, Precision, under multi-class classification (6 class).It can be seen that the DNN classifier gives the highest precision rate for Normal traffic and three types of attacks, namely, MITM attacks 100%, Malware attacks 97%, and Scanning attacks 94%, while for DDoS attacks and Injection attacks, the highest precision rate are given by RF classifier with 98% and 67%, respectively.We note that all machine learning algorithms produce no false positives for the Normal class, which means that the precision rate is 100%.Therefore, we observe that the DNN classifier can give a higher recall for two types of attacks, namely, DDoS attacks with 99% and MITM attacks with 100%.For the Injection attacks and Scanning attacks, the SVM classifier gives a  higher recall with 91% and 91%, respectively.We note also that all machine learning algorithms produce no false positives for the Normal class, which means that the recall rate is Fig. 6 illustrates the five more important features for each class based on the interpretation of Random Forest prediction which is helpful for further forensic analysis.We can see that different protocols information contributed well to identifying a variety of attacks.
Tab.Therefore, we observe that the KNN classifier can give a higher recall for three types of attacks, namely, Backdoor attack 94%, OS Fingerprinting 70%, and Ransomware attack 94%.The DNN classifier can give a higher recall for five types of attacks, namely, HTTP flood DDoS attack 92%, TCP SYN Flood DDoS attack 100%, UDP flood DDoS attack 100%, MITM attack 100%, and Password cracking attack 91%.The DT classifier can give a higher recall for two types of attacks, namely, ICMP flood DDoS attack 100% and SQL Injection 96%.The SVM classifier can give a higher recall for three types of attacks, namely, Port Scanning 100%, Vulnerability scanning attack 86%, and Cross-site Scripting (XSS) attack 88%.Finally, The RF classifier can give only a higher recall for Upload attack 51%.

B. FEDERATED MACHINE LEARNING
The evaluation results of the federated deep learning approach for three types of classification, namely, 2-class (binary classification), 6-class (multi-classification), and 15-class (multi-classification), are presented in Tab. 15.In particular, the results present the three types of accuracy metric, namely, global model accuracy, worst client accuracy, and best client accuracy.These all accuracies are obtained for the first and the 10 th round of deep learning network  under the federated learning mode.In addition, the results are obtained for two modes, namely, 1) non-independent and identically distributed (Non-IID) and 2) independent and identically distributed (IID).From these results, we first observe that with federated deep learning, the performance of all global models are able to approximate the centralized model's performance.The second finding is that under the IID data distribution strategy, the Best, Worst, and Global models are strongly matched to each other in a consistent manner throughout all parameters and datasets (i.e., 2-class, 6-class, 15-class).The third observation is that with the Non-IID case, the clients are able to benefit from the federated learning strategy.A clear illustration of a good example is with a 15-class dataset, where K = 15, the best accuracy of the client was 71.42%, but with 10 th of federated learning rounds, the client achieved an accuracy of 91.74%.

VII. CONCLUSION
In this paper, we proposed a new comprehensive realistic cyber security dataset of IoT and IIoT applications, called Edge-IIoTset, that cyber security researchers can use to evaluate their proposed machine learning-based intrusion detection systems in two different modes, namely, centralized and federated learning.The proposed testbed is organized into seven layers, including, Cloud Computing Layer, Network Functions Virtualization Layer, Blockchain Network Layer, Fog Computing Layer, Software-Defined Networking Layer, Edge Computing Layer, and IoT and IIoT Perception Layer.It addresses the limitations of the current data sets and is appropriate for the key requirements of IoT and IIoT applications, where we provided new emerging technologies In each layer, such as ThingsBoard IoT platform, OPNFV platform, Hyperledger Sawtooth, Digital twin, ONOS SDN controller, Mosquitto MQTT brokers, Modbus TCP/IP, etc.The IoT data are generated from various IoT devices (more than 10 types).This dataset is analyzed using a primary exploratory data analysis with the performance of machine learning approaches in both centralized and federated learning modes.

FIGURE 2 .
FIGURE 2. The proposed dataset generation framework.

FIGURE 6 .
FIGURE 6. Features importance for each class by random forest.

For 2 -
class (i.e., binary classification), the best results in the first round of deep learning network under the federated learning mode are obtained when the number of clients k = 5 and k = 10 and with the mode of non-independent and identically distributed (Non-IID), where the best client accuracy achieves 99.98%, the worst client accuracy achieves 99.97%, the global model accuracy achieves 99.97%.However, with the 10 th round of deep learning network, the best results are obtained when the number of clients k = 5, k = 10 and, k = 15 and with the mode of independent and identically distributed (IID), where the best client accuracy achieves 100%, the worst client accuracy achieves 99.99%, and the global model accuracy achieves 100%.For 6-class (i.e., multi-classification), the best results in the first round of deep learning network under the federated learning mode are obtained when the number of clients k = 10 and with the mode of independent and identically distributed (IID), where the best client accuracy achieves 95.33%, the worst client accuracy achieves 95.26%, and the global model accuracy achieves 95.34%.With the 10 th round, the best results of deep learning network are obtained when the number of clients k = 10 and with the mode of independent and identically distributed (IID), where the best client accuracy achieves 96.00%, the worst client accuracy achieves 95.89%, and the global model accuracy achieves 95.99%.For 15-class (i.e., multi-classification), the best results in the first round of deep learning network under the federated learning mode are obtained when the number of clients k = 15 and with the mode of independent and identically distributed (IID), where the best client accuracy achieves 93.00%, the worst client accuracy achieves 93.02%, and the global model accuracy achieves 93.22%.With the 10 th round, the best results of deep learning network are obtained when the number of clients k = 15 and with the mode of independent and identically distributed (IID), where the best client accuracy achieves 93.38%, the worst client accuracy achieves 92.91%, and the global model accuracy achieves 93.37%.

TABLE 1 .
Available IoT and IIoT datasets for cyber security.

TABLE 2 .
Hardware and Operating systems used in the creation of Edge-IIoTset dataset.

TABLE 3 .
Open source tools used in the creation of Edge-IIoTset dataset.

TABLE 5 .
The list of attack scenarios included in Edge-IIoTset dataset.

TABLE 6 .
Statistics of normal instances included in Edge-IIoTset dataset.theIPaddress of the source device.During this kind of DDoS attack, an attacker will typically not use his real IP address but will impersonate the source IP address of the UDP packets instead, preventing the attacker's real place from being revealed and possibly flooded with the target server's response packets.The offensive systems with the following IP addresses: 190.123.219.128,16.226.184.201,153.125.214.15, and 91.184.12.91 were used to send manipulated UDP packets using the tool hping3-based python script.Technically, ICMP echo query and echo reply packets are employed to ping a network device to help diagnose the state of device health and connection between the source and the destination device.Flooding the destination with query packets, the network is constrained to reply with an identical number of reply packets.This makes the destination unavailable to regular network traffic.The offensive systems with the following IP addresses: 213.117.18.213, 183.223.100.122,166.153.227.121,49.81.59.152, and 227.117.33.125 were used to send manipulated ICMP packets using the tool hping3-based python script.
• HTTP flood DDoS attack: This is a type of distributed denial of service (DDoS) attack that is intended to flood a particular target server with HTTP queries.After the target has been flooded with demands and is incapable • ICMP flood DDoS attack: An Internet Control Message Protocol (ICMP) flood DDoS attack is a popular denial of service (DoS) attack where an attacker tries to flood a targeted device through ICMP echo queries (pings).

TABLE 7 .
The list of extrapolated features obtained from different sources, including alerts, system resources, logs, IoT and IIoT network traffic.File 1.5 (Modbus): This file includes two documents, namely, Modbus.csv and Modbus.pcap.The IoT sensor (Modbus Sensor) is used to capture the IoT data.File 1.6 (phValue): This file includes two documents, namely, phValue.csvand phValue.pcap.The IoT sensor (pH-sensor PH-4502C) is used to capture the IoT data.

TABLE 8 .
Notation for the discussion of algorithms.

TABLE 9 .
Settings for deep learning classifier.

TABLE 10 .
Statistics of normal and attacks in Edge-IIoTset.

TABLE 11 .
Statistics of total selected observation for training and testing.

TABLE 12 .
Classification report for 2-class of deep learning (Centralized model performance).

TABLE 13 .
Classification report for 6-class of traditional machine learning as well as deep learning (Centralized model performance).

TABLE 14 .
Classification report for 15-classes of traditional machine learning as well as deep learning (Centralized model performance).

TABLE 15 .
The evaluation results of the federated deep learning approach.
14 provides the obtained centralized model results of machine learning techniques (DT, RF, SVM, KNN, DNN) in terms of F1-score, Recall, Precision, under multi-class classification (15 class).It can be seen that the DNN classifier gives the highest precision rate for Normal traffic and six types of attacks, namely, Backdoor attack 99%, ICMP flood DDoS attack 100%, UDP flood DDoS attack 100%, MITM attack 100%, Port Scanning attack 100%, and SQL Injection 91%.The SVM classifier gives the highest precision rate for Normal traffic and three types of attacks, namely, HTTP flood DDoS attack 86%, OS Fingerprinting attack 80%, and Password cracking attack 61%.The RF classifier gives the highest precision rate for Normal traffic and three types of attacks, namely, TCP SYN Flood DDoS attack 100%, Ransomware attack 96%, and Cross-site Scripting (XSS) attack 65%.The DT classifier gives the highest precision rate for Normal traffic and two types of attacks, namely, Upload attack 100% and Cross-site Scripting (XSS) attack 65%.