Machine Learning and Deep Learning Approaches for CyberSecurity: A Review

The rapid evolution and growth of the internet through the last decades led to more concern about cyber-attacks that are continuously increasing and changing. As a result, an effective intrusion detection system was required to protect data, and the discovery of artificial intelligence’s sub-fields, machine learning, and deep learning, was one of the most successful ways to address this problem. This paper reviewed intrusion detection systems and discussed what types of learning algorithms machine learning and deep learning are using to protect data from malicious behavior. It discusses recent machine learning and deep learning work with various network implementations, applications, algorithms, learning approaches, and datasets to develop an operational intrusion detection system.


I. INTRODUCTION
The internet is transforming people's jobs, learning, and lifestyles, and today, allowing to the integration of social life and the internet, which increases security threats in various ways. What counts now is learning how to identify network threats and cyberattacks, particularly those previously seen. Cybersecurity is defined as the process of implementing cyber protective measures and policies to protect data, programs, servers, and network infrastructures from unauthorized access or modification. The internet connects the majority of our computer systems and network infrastructure. As a result, cybersecurity emerged as the backbone for practically all types of corporations, governments, and even people to secure data, grow their businesses, and maintain privacy.
People send and receive data across network infrastructure, such as a router, that can be hacked and manipulated by outsiders. The increased use of the internet has increased the amount and complexity of data, resulting in the emergence of big data. The constant rise of the internet and extensive data necessitated the creation of a reliable intrusion detection system. Network security is a subset of cybersecurity that The associate editor coordinating the review of this manuscript and approving it for publication was Shunfeng Cheng. safeguards systems connected to a network against malicious activity. The goal is to provide networked computers to ensure data security, integrity, and accessibility. Current cybersecurity research focuses on creating an effective intrusion detection system that can identify both known and new attacks and threats with high accuracy and a low false alarm rate [1].
As shown in Figure 1, the terms Artificial Intelligence (AI), Machine Learning (ML), and Deep Learning (DL) are frequently used interchangeably to describe the same principles in software development. These names all indicate the same thing: a machine programmed to learn and find the best solution to a problem. DL is a subfield of machine learning, whereas machine learning is a subfield of AI. As a result, ML and DL are employed to create an efficient and effective intrusion detection system. This paper provides an overview of machine learning and deep learning applications and approaches in intrusion detection systems by concentrating on network security technologies, methodologies, and implementation.
Alan Turing stated that general use computers could learn and qualify originality, which has paved the way to whether computers should look at data to develop rules rather than allow humans to do it. Machine learning algorithms are algorithms that can learn and adapt based on data. Machine learning algorithms are designed to generate output based on what is learned from data and examples. For example, such algorithms will allow a computer to choose and perform a particular task on novel traffic detection without explicit information [2].
Automatic analyses of attacks and security events, such as spam mail, user identification, social media analytics, and attack detection may be performed efficiently using machine learning [1]. As indicated in Figure 2, there are three main techniques to machine learning: supervised, unsupervised, semi-supervised, and reinforcement learning. Supervised learning is based on labeled data, unsupervised learning is based on unlabelled data, and semi-supervised learning is based on both. Deep learning (DL) is a new subfield of machine learning, which is itself a subfield of Artificial Intelligence (AI). Traditional machine learning techniques are limited to processing natural raw data that rely on adequate feature extraction, and in order to classify or find patterns by a classifier, the raw data must be transformed into the appropriate format, which is where deep learning comes in. Deep learning is a machine learning approach that can learn from unstructured or unlabeled data and representation based on human brain knowledge [3].
Deep learning is motivated by neural networks (NN), which can mimic the human brain and perform analytical learning by analyzing data like text, images, and audio [4]. In contrast to deep learning models, which feature multiple connected layers, shallow learning models are built up of a few hidden layers. By stacking layers on top of layers, DL will be able to express increasing complexity functions more effectively. DL is used to learn representations with many abstraction levels [5]. Deep neural networks are capable of finding and learning representations from raw data and performing feature learning and classification [6]. Machine learning methodologies are also utilized in deep learning. However, other ways are employed in deep learning, such as Transfer Learning, as shown in Figure 3. The remainder of the paper is organized as follows: Section 2 discusses the intrusion detection system concept. Section 3 summarises the most frequently utilized datasets for the intrusion detection system. Section 4 discusses recent advances in machine learning and deep learningbased intrusion detection systems, while Section 5 concludes this paper.

II. INTRUSION DETECTION SYSTEMS
Intrusion Detection is the process of monitoring network traffic and events in computers in order to detect unexpected events, and it is called Intrusion Detection System (IDS) when a software application is used to do so [7]. IDS is a type of network security that can identify and sense risks before services are lost, illegal access is granted, or data is lost [6]. IDS can also provide a graphical user interface through which users can interact by having access to various features when doing the IDS testing and training process [4]. Figure 4 depicts the deployment of two IDS methods depending on activities: a Network-Based Intrusion Detection System (NIDS) and a Host-Based Intrusion Detection System (HIDS). NIDS, for example, examines packets gathered by network devices such as routers, while HIDS examines events on a host computer. Hybrid detection is a system that combines the best of both worlds [1], [8].

1) ANOMALY DETECTION
This model assumes that specific abnormal traffic has a low probability and can be distinguished from regular traffic with a high probability [9]. Unsupervised learning and statistical learning-based anomaly detection algorithms can detect unique and undiscovered assaults.

2) MISUSE DETECTION
This approach is a signature-based technique. While monitoring threats in an IDS, detection can occur based on known attack signatures [1]. This strategy is based on supervised learning and can detect illegal or suspicious behaviors that can be used to defend against similar assault behaviors.

B. ATTACK CLASSIFICATION
As the network's diversity increased, attacks and threats evolved, becoming more sophisticated and non-repetitive. As a result, numerous attack types have been identified, including DoS, Probe, U2R, Worm, Backdoor, R2L, and Trojan [9]. Denial of service (DoS) attacks are among the most common network resource attacks, as they render network services unavailable to all users. They employ a variety of different behaviors and methods to consume network resources. For Probe, the intruder marks open ports after scanning all devices connected to the network to exploit them and gain network access. Then there is Remote to User (R2U), in which an attacker sends packets to various devices across a network to gain access as a local user [10]. For this definition, a worm is defined as a malicious application capable of selfreplication from one device to another [9]. Finally, User to Root (U2R) is used, in which the intruder attempts to access network resources to use them as a local user after numerous trials [11]. • Accuracy -The ratio of correct predictions to records; a higher accuracy indicates a more accurate prediction by the learning model.
• Recall -The model's capacity to locate all positive records is the detection rate, as it quantifies the correctly predicted records.
• Precision -The capacity to avoid mislabeling negative records as positive; a high precision rate equates to a low rate of false positives.
• F1-Score (F1) -The sum of Precision and Recall; a higher F1 indicates a more effective learning model.  For decades, scientists and researchers have been attempting to develop and build an intrusion detection system that is both effective and efficient. With the advent of artificial intelligence, all IDS models utilized machine learning methodologies and approaches. However, after years of research, deep learning began to perform better for IDS, as seen by assessment indicator outcomes. Section IV will explore machine learning and deep learning in IDS.

III. DATASETS
When it comes to intrusion detection systems, one should consider the dataset employed to ensure the system's accuracy. Nowadays, applications and networks are growing exponentially, necessitating resilient network security. It can be accomplished by selecting the proper datasets for training and testing. Following that, a summary of the most often used dataset in intrusion detection systems will be discussed.

A. KDD CUP 1999
This dataset is the most widely used dataset for intrusion detection, based on the DARPA dataset. This dataset includes basic and high-level TCP connection information such as the connection window but no IP addresses. In addition, this dataset contains over 20 different types of attacks and a record for the test subset [10].

B. UNSW-IDS15
Founded in 2015 by Australian Centre for Cyber Security (ACCS). Samples in this dataset contain normal and malicious traffic [12], and it has been collected from three real-world websites; BID (Symantec Corporation), CVE (Common Vulnerabilities and Exposures), and MSD (Microsoft Security Bulletin) and then to generate the dataset, it emulated in a laboratory environment. This dataset has nine attack families, such as worms, DoS, and fuzzers [9].

C. CIC-IDS2017
The dataset was generated in 2017 by the Canadian Institute for Cybersecurity. This dataset contains normal and attack scenarios and includes an abstract behavior for 25 users based on SSH, HTTPS, HTTP, FTP, and email protocols [8], [13].

D. NSL-KDD
It is the improved KDD dataset, where a large amount of redundancy has been removed, and an advanced sub-dataset has been created [10]. This dataset utilizes the same KDD99 attributes and belongs to four attack categories: DoS, U2R, R2L, and Probe [8].

E. PU-IDS
A derivative dataset from NSL-KDD is generated to extract a statistic from an input data and then utilized to create new synthetic instances. The traffic generator of this dataset obtained the same format and attributes as the NSL-KDD dataset [8].   Table 6 shows a comparison of several deep learning methods, the year the dataset was created, whether it was publicly available, the number of characteristics that were utilized for analysis, and lastly, how much traffic the data handled.

IV. INTRUSION DETECTION SYSTEMS IN RECENT WORKS USING MACHINE LEARNING AND DEEP LEARNING
Methodologies and algorithms have undergone significant change and evolution to produce the most acceptable intrusion detection system in many applications that attempt to identify constantly changing threats and attacks. Initially, classification was based on machine learning, but as performance needed to be further improved, deep learning was utilized to produce higher accuracy and a lower false alarm rate. The primary distinction between machine learning and deep learning is illustrated in Figure 5, and it is based on the method by which the system gets input. It depends on how the data is trained by machine learning, but it depends on the connections between artificial neural networks in deep learning to train data without requiring many human interactions. Additional differences between machine learning and deep learning are summarised here and in Table 7.
• Data dependencies -This metric indicates the volume of data. In traditional machine learning, based on rules, performance is improved when the data set is limited.
In comparison, deep learning performs better with a vast number of data since a significant amount is required for accurate interpretation and understanding.
• Feature processing -This is a method of extracting features to generate patterns that contribute to the implementation of learning algorithms and reduce the complexity of the data. In other words, the feature process is used to do categorization and feature detection on raw data. While in machine learning, the expert must determine the necessary representations, in deep learning, the representations are identified automatically through the use of deep learning algorithms.
• Interpretability -This is described as a model's capacity to comprehend human language. An interpretable model can be understood without extra tools or procedures.
On the other hand, it is difficult to specify how neurons should be modeled and how the layers should interact in deep learning, making it difficult to explain how the result was obtained.
• Problem-solving -In conventional machine learning, the problem is divided into sub-problems, each of which is solved independently, and then the final answer is obtained. On the other hand, deep learning will resolve the issue completely [4].
The following subsections describe how researchers employed machine learning and deep learning to create an intrusion detection system.

A. MACHINE LEARNING IDS ALGORITHM
This subsection discusses recent research into IDS implementations that utilize a variety of machine learning algorithms. Machine learning algorithms, such as support vector machine (SVM) and random forest (RF), have been used to investigate the binary categorization of IDS using a supervised learning approach [14]. SVM outperformed RF throughout the training process, whereas RF outperformed SVM during the test procedure. Additionally, they concluded that a classifier's performance would vary based on the dataset and attributes. An IDS model based on a decision tree, naïve Bayes, and the random forest was proposed by [15] to classify Probe, R2L, and U2R on the NSL-KDD dataset. It is discovered that the highest accuracy was achieved in detecting DOS attacks using the RF algorithm. Additionally, when they compared their hybrid model with its 14 features to other hybrid models with varying features, the hybrid model had a greater accuracy for DOS, Probe, and U2R and a nearly identical accuracy for R2L.
In order to increase the performance of the attack detection model, an intrusion detection strategy utilizing SVM ensemble with the feature was presented in [16]. They examined validated training data and discovered that it might be used to improve the detection process resulting in the fast training time, high accuracy, and low false alarm rate. However, because this strategy trains classifiers independently of feature spaces and then combines judgments via an ensemble, some correlations across feature spaces will be missed during classifier learning, lowering the model's accuracy. Three datasets comprising high-level network features were explicitly created for non-payload-based network intrusion detection systems in [17] by enabling machine learning classifiers to use Advanced Security Network Metrics (ASNM) features. It was the first dataset to include adversarial obfuscation techniques and benign traffic samples that were applied to the malicious traffic execution of TCP network connections. While such classifiers can detect a sizable percentage of unknown threats, some unknown attacks may be undetectable, as illustrated in Figure 6.
The requirement for a horizontal platform for IoT applications/M2M resulted in creating the worldwide standard OneM2M [18], which aims to address the requirement for an M2M service layer that enables communication across heterogeneous apps and devices seen in Figure 7. Additionally, the authors investigated the second line of defense for oneM2M IoT networks that can identify and prevent threats and intrusions, dubbed Machine Learning-based Intrusion Detection and Prevention System, which can detect and prevent not only known but also unexpected attacks.
They developed their dataset from real-world IoT networks and implemented a detection model with three machine learning levels to identify and detect assaults and threats. They obtained 99.93 % accuracy for the second detection level when using a decision tree-based machine learning algorithm and 99.34 % accuracy when using an encoder-based machine learning strategy. However, this model obtained a high degree of accuracy and can detect and respond to risks associated with the oneM2M service layer. The use of Artificial Neural Networks (ANNs) was proposed by [18] to detect malicious traffic by training them on a large variety of benign and malicious traffic data. ANNs create weights that are adaptively tuned during the training phase by a learning rule. Their methodology outperformed signature-based detection, with an accuracy of 98 %. Table 8 analyses the learning method, performance metric, dataset, attack type, strengths, and limits of machine learning techniques based on intrusion detection systems.

B. DEEP LEARNING IDS ALGORITHM
This subsection discusses recent implementations of DL-IDS using a variety of deep learning methods. A model was introduced by [24] to collect and label real network traffic using their dataset in order to investigate mobile application identification and connect it to a cloud server.  The classification was learned using deep learning methods such as AE, CNN, and RNN, with the greatest performance, obtained when utilizing CNN and LSTM, with an accuracy of 91.8 % for 1D CNN classifiers and 90.1 % for F-measure. However, their analysis was limited to a particular application, and because all features are equally essential, CNN and RNN lack a crucial evaluation function while still extracting features adequately.
An intelligent intrusion detection system was developed by [25] that combines deep learning algorithms with network virtualization to detect malicious behavior on IoT networks. Their technique enables efficient anomaly detection in IoT networks regarding scalability and interoperability by simulating and tracing five different attacks. Their model achieved a precision rate of 95% and a recall rate of 97% for various threat scenarios. However, as with many other IDS models, they emphasize detection rather than prevention techniques. Figure 8 illustrates the implementation of the deep learning model for IDS.
A deep learning classification model using NSL-KDD and KDD CUP99 was proposed in [26] to address increased human engagement and decreasing accuracy. The model was constructed using an unsupervised learning technique known as Non-symmetric Deep Autoencoder (NDAE). Their model required less training time than DBN and improved accuracy by 5% compared to pure Autoencoder, and is depicted in Figure 9. It consists of two NDADs with three hidden layers each, and the two NDAEs are joined using an RF method. Their methodology, however, is ineffective in detecting complex attacks due to its high false alarm rate.
Convolutional neural networks with the NSL-KDD dataset were investigated in [28] and are depicted in Figure 10. In addition, the authors investigated a method for detecting threats in a vast real-time network by converting the raw data to an image data format, which aids in resolving the unbalanced dataset issue by computing the cost function for each class from the training sample. As a result, they were able to reduce the number of computing parameters in their model, but their model's accuracy was low compared to other machine learning and neural network models. Table 9 summarizes various deep learning algorithms for IDS.  In [27], a combination of CIC-IDS 2017, NSL-KDD, Kyoto, UNSW-NB15, and WSN-DS datasets was proposed to categorize and detect unplanned and unexpected cyberattacks using a deep neural network. The performance of this model was evaluated by comparing it to other machine learning classifiers, and their model outperformed the others. Similarly, in [2], the author suggested a deep neural network approach for classifying network data as harmful or benign. He supplemented his analysis with two more datasets: UNB-ISCX 2012 and CIC-IDS 2017. First, a feedforward Deep Neural Network was utilized for training the model, and then an Autoencoder was employed to categorize assaults and threats in the absence of tagged harmful data. Their model was 99.96% accurate for UNB-ISCX 2012 and 99.96% accurate for CIC-IDS 2017. Additionally, their research established the critical nature of the datasets needed to construct an IDS and the efficacy of Autoencoder for anomaly detection.
To enhance detection accuracy in IDS, the author incorporated big data, deep learning approaches, and natural language processing in [28]. They worked with KDD CUP99 and achieved an accuracy of 94.32 % with their model. In addition, another deep neural network method was introduced in [29] to detect risks and attacks in the cloud environment. Their approach used Simulated Annealing and Improved Genetic Algorithms to create the hybrid optimization framework IGASAA using the datasets NSL-KDD2015, CIC-IDS2017, and CIDDS-001. Compared to the Simulated Annealing Algorithm (SAA), their model demonstrated a higher detection rate, increased accuracy, and a lower false alarm rate.
Web application security is highly reliant on detecting malicious HTTP traffic, which needs a significant investment in training data gathering and a large dataset. To detect malicious HTTP traffic, the authors in [29] introduced the DeepPTSD method based on a deep transfer semi-supervised learning methodology. The construction of their model is given in Figure 11. They used two raw public datasets from FSecurify and another from their lab via a honeypot server. When a little training dataset is available, their model exceeds other existing baselines, with a precision of 93.33% compared to 86.67 % and 86.61 % for CNN and RNN, respectively. An intrusion detection model based on a convolutional neural network was presented in [30] to extract structural information. The authors performed multiclassification on NIDS using the NSL-KDD and KDD-CUP99 datasets. Their model's accuracy increased compared to other classifiers, resulting in enhanced detection of unknown threats and a decrease in false alert rates. A feedforward deep neural network was proposed by [1] for an intrusion detection system to perform binary classification on the NSL-KDD dataset. Due to the dense structure of this model, it beat the usual machine-learning technique in terms of scalability with big datasets and time for training data. As a result, there was a high proportion of true positives and accurate categorization records, with this model achieving an accuracy of 89%. In [31], an RNN-based IDS binary and multiclass classification technique were investigated. This model outperformed convolutional machine learning algorithms and demonstrated that it is suited for classification with high accuracy. The authors trained and tested their model on the NSL-KDD dataset. Figure 12 illustrates the RNN structure and the proposed RNN-IDS model.
Deep neural networks were used in [32] to investigate the applicability of anomaly-based intrusion detection systems. Based on the NSL-KDD dataset, the authors studied a variety of machine learning and deep learning frameworks. According to the comparison, deep learning outperformed machine learning in the accuracy test. The best performance was first achieved by the RNN, then by the CNN, and finally by the Autoencoder. A comparison of deep learning methods based on intrusion detection systems is presented in Table 9, which compares the learning algorithm, performance metric, dataset, attack targeted, strengths, and limits of the algorithms.

C. HYBRID LEARNING IDS ALGORITHM
This section discusses works that combine machine learning and deep learning or use many algorithms of the same learning type. First, a deep learning-based intrusion detection system for an IoT network was developed in [39]. By providing a model based on Gated Recurrent Neural Networks (GRU and LSTM), their detection dataset was KDD99 cup. They proposed adding deep learning classifiers to each TCP/IP architecture layer to increase its complexity. The model's accuracy was 98.91 %, and the false alarm rate was 0.76 %. However, one may argue that the model's robustness was low.
Hierarchical Intrusion Detection System (HAST-IDS) was developed in [40] to improve anomaly detection. As illustrated in Figure 13, they began by extracting spatial features using CNN and then temporal characteristics using LSTM. Finally, they evaluated the performance of their proposed model using the ISCX2012 and DARPA datasets. Although the hierarchical CNN-LSTM model beats pure CNN or LSTM models and gives higher accuracy for IDS, it is computationally expensive because of its complicated architecture.  [41] is an intrusion detection system that was developed to ensure the security of connections between connected smart vehicles. This model is built on a framework for continuous automated secure service availability and utilises a decision tree and deep belief network to classify attacks and reduce their dimensionality. Security attacks in smart connected vehicles an intrusion detection system based on continuous automated secure service availability framework was proposed in [41]. The model classifies attacks and reduces their dimensionality using a decision tree and deep belief machine learning. A model for enhancing IDS performance was provided by [42] by integrating three classifiers with big data. The methods utilized were a combination of machine learning and deep learning techniques, including Random Forest (RF), Deep Neural Network (DNN), and Gradient Boosting Tree (GBT). The authors evaluated their strategy using the CIC-IDS2017 and UNSW-NB15 datasets. DNN has the highest accuracy at 99.19 % based on UNSW-NB15 and 99.99 % based on CIC-IDS2017. Although all three classifiers achieved good accuracy, training the model was difficult due to the features' wide variety of numerical data.

D2H-IDS
In wireless sensor networks, IDS was performed using a combination of machine learning and deep learning [43]. The authors proposed the Restricted Boltzmann machinebased clustered RBC-IDS approach as a deep learning technique. They used the KDD Cup99 dataset and Network Simulator-3 to compare their model against adaptive machine learning-based IDS (NS-3). While RBC-IDS has high accuracy, the detection time was comparable to that of the adaptive machine learning model, resulting in overhead expenses. A hybrid network IDS was utilized in [6] using the UNSW-15 dataset that utilized the CNN-LSTM algorithm. When applied to real-world devices, they employed a transfer learning approach to optimise the IDS model's efficiency. Their model was 98.43 % accurate.
CBR-CNN (Channel Boosted and Residual Learning) was created in [44], employing deep Convolutional Neural Networks for intrusion detection using the NSL-KDD dataset. Training is carried out using an unsupervised learning approach, and normal traffic is modeled using stacked autoencoders (SAE). Their model had an accuracy of 89.41 % for KDD-Test+ and 80.36 % for KDD-Test-21, respectively. Table 10 analyses the learning method, performance metric, dataset, attack type, strengths, and limits of hybrid learning algorithms based on intrusion detection systems.

D. DISCUSSION AND OPEN CHALLENGES
Intrusion detection systems are now considered a necessary component of our daily lives. However, developing an intrusion detection system capable of detecting and VOLUME 10, 2022 responding to a wide range of attacks and threats is a difficult task. As a result, hundreds of studies in the field of intrusion detection systems have been carried out for various applications by academic researchers. Some academics believe that deep learning, through a neural network, will enable greater flexibility in IDS, allowing it to detect and classify harmful threats more effectively. This flexibility is because its algorithms have hidden layers with a high-dimensional feature representation of the data.
A comprehensive assessment of network-based intrusion detection systems was offered in [10], in which they stressed the need for labeling data when doing evaluation and training on anomaly-based intrusion detection systems. Moreover, in [45], the author investigated the possibility of improving model optimization, and they concluded that the supervised learning approach is more successful than the unsupervised learning approach. After all, it can achieve higher performance in terms of the algorithms used because it uses labeled data to train the models. NADS implementation with various applications, data centers, fog, cloud computing, and the Internet of Things (IoT) was a priority [13]. The authors asserted that datasets not based on reality might result in mistaken studies in their conclusions. Employing ESR-NID computation approaches, they provided in [45] a model for searching for a solution to automatically generate rulesets for network intrusion detection by using computation techniques (Evolving Statistical Rulesets for Network Intrusion Detection). The model outperforms other existing models and is capable of dealing with a variety of various types of attacks.
To summarize, some researchers were concentrating on whatever algorithm would provide the best performance, such as [14], [15], [21]- [23], [33], [39]. A comparison between different types of algorithms used for IDS is presented in Table 11, in terms of the learning approach, advantages, and disadvantages.
As a means of increasing accuracy and improving model implementation, some researchers investigated combining algorithms in order to achieve higher accuracy or a lower false alarm rate, as in [40], [41], while others combined methods in machine learning and deep learning, as in [43], [44], [46]. Some researchers experimented to see which dataset could provide a more stable model, as in [15], [21], [25], [35], [38], [43], while others created their dataset to use in IDS development, as in [17], [24], [47]. Each dataset contains a different range of threats and attacks, so some researchers experimented to see which dataset could provide a more stable model. The intrusion detection system field has many challenges, represented by:

1) UNAVAILABILITY OF UP-TO-DATE DATASET
A highly effective IDS must be trained and tested against a dataset of new and old threats and attacks. When more patterns and types of attacks are discovered in a dataset, the model becomes more resistant to various attack types. Thus, one of the challenges for IDS is to maintain an up-to-date dataset with sufficient records to cover the majority of attack types.

2) HYPERPARAMETER TUNING
The deep structure of an IDS model requires that the hyperparameters be specified. The activation function and optimization method, the number of nodes per layer, and the total number of layers in a network are all hyperparameters. Hyperparameters affect training and model building, with the ability to increase or decrease the IDS model's accuracy and detection rate. Hyperparameters can be tuned manually, which will take a significant amount of time, or automated to improve the performance of the IDS model.

3) IMBALANCED DATASET
Existing datasets contain varying numbers of records for various types of attacks. These differences will affect the accuracy and detection rate of various types of attacks. A lowrecord attack will have a lower detection rate than a highrecord attack. This issue can be resolved by either balancing the dataset or by increasing the number of minority attack records.

4) PERFORMANCE IN REAL-WORLD
When researchers attempt to develop an intrusion detection system, they train and test the model in laboratories, with the majority of the data coming from public sources. Thus, an IDS model faces a challenge when it is implemented in a real-world environment, as the model developed in the lab should be validated in a real-world environment to ensure its efficiency.

V. CONCLUSION
One of the essential subjects in the cybersecurity area was intrusion detection systems. Many researchers are developing a system that will secure data against malicious conduct. However, research into other applications of learning algorithms, such as establishing a new dataset or merging algorithms, is currently ongoing. As a result, we explain the concept of an intrusion detection system, types of attacks, and how to determine whether or not we have an effective system in this work.
Selecting a good dataset to train and test an intrusion detection system is a crucial parameter, and it was clear that datasets have an impact on research in this sector, as some deem it out of date or contains redundant information. As a result, the most frequent datasets used to detect threats over the last decade are compared in the research.
The final step in this project was to look into what other people did to save their data. Recent research has revealed that there are numerous data protection implementations. They employed machine learning for several purposes at first, and many studies were conducted to determine which algorithm would provide higher accuracy or which datasets would produce a lower false alarm rate. Finally, they arrived at deep learning after extensive investigation and testing. Many studies and experiments have shown that deep learning is superior to machine learning because it can handle more complicated problems with greater accuracy and lower false alarm rates. Previous work has been used in a variety of applications. They employed various datasets, architectures, learning methodologies, and learning algorithms to secure data from attacks and dangers each time. MOHAMED HADI HABAEBI (Senior Member, IEEE) is currently a Professor with the Department of Electrical and Computer Engineering, International Islamic University Malaysia (IIUM). His research interests include the IoT, mobile app development, networking, blockchain, AI applications in image processing, cyber-physical security, wireless communications, small antennas, and channel propagation modeling.
MURAD HALBOUNI received the bachelor's degree in telecommunication engineering from Palestine Technical University, Kadoorie, Palestine. He is currently pursuing the M.S. degree in cyber crime with Arab American University, Palestine. His research interests include cybercrime and digital evidence analysis, metro networks, network security, and machine learning. He also works at Paltel, a Palestinian communication business, as a Network Engineer.
MIRA KARTIWI (Member, IEEE) is currently a Professor with the Department of Information Systems, Kulliyyah of Information and Communication Technology, and currently the Deputy Director of E-learning with the Centre for Professional Development, International Islamic University Malaysia (IIUM). She was one of a recipients of the Australia Postgraduate Award (APA), in 2004. For her achievement in research, she was awarded the Higher Degree Research Award for Excellence, in 2007. She has also been appointed as an Editorial Board Member in local and international journals to acknowledge her expertise. She is also an experienced consultant specializing in the health, financial, and manufacturing sectors. Her research interests include health informatics, e-commerce, data mining, information systems strategy, business process improvement, product development, marketing, delivery strategy, workshop facilitation, training, and communications.
ROBIAH AHMAD (Senior Member, IEEE) received the B.Sc. degree in electrical engineering from the University of Evansville, Evansville, IN, USA, the M.Sc. degree in information technology for manufacturer from the Warwick Manufacturing Group, University of Warwick, U.K., and the Ph.D. degree in mechanical engineering from University Teknologi Malaysia, Malaysia. She is currently an Associate Professor with the Razak Faculty of Technology and Informatics, UTM, Kuala Lumpur, Malaysia. She has more than 20 years experience as a Research Scientist. She has published more than 100 peer-reviewed international journal articles/proceedings in areas of instrumentation and control, system modeling and identification, and evolutionary computation. She currently holds a position as an executive committee for Humanitarian Activities for IEEE Malaysia Section and the Past Chair for IEEE Instrumentation and Measurement Society Malaysia Chapter. VOLUME 10, 2022