An Efficient Support Vector Machine Algorithm Based Network Outlier Detection System

With the increase of cyber-attacks and security threats in the recent decade, it is necessary to safeguard sensitive data and provide robust protection to information systems and computer networks. In this paper, an anomaly-based network outlier detection system (NODS) is proposed and optimized to check and classify the incoming network traffic stream’s behaviours that affect the computer networks. The proposed NODS has high classification efficiency. Network connection events classified as outliers are reported to the network admin to drop and block its packets. The NSL-KDD and CICIDS2017 intrusion datasets were employed to build the proposed system and test its detection capabilities. Sequential scenarios were implemented to optimize the system’s effectiveness. Network features were normalized by min-max and Z-Score approaches, while the relevant features were selected individually by the principal component analysis (PCA) and correlated features selection (CFS) techniques. Support vector machine (SVM) and Gaussian Naive Bayes (GNB) algorithms are used to build the detection model, while the Genetic algorithm (GA) was employed to tune their control parameters. The obtained evaluation results proved that the proposed SVM based NODS is characterized by low false alarms and detection time as well as high classification accuracy. Furthermore, a comparative analysis was conducted with other existing techniques, and the results obtained demonstrate the effectiveness of the proposed SVM-IDS


I. INTRODUCTION
Internet technologies and communication networks are evolving daily.In parallel, the advancement of cyber-attacks and the appearance of novel security vulnerabilities are quickly rising too [1].Attempts that breach computer networks' availability, security, and privacy are known as network intrusions, anomalous or outliers [2].Outlier detection is mainly employed for recognizing anomalous activities in many fields like network attacks detection.It is denoted as the process of identifying data points which are varied from The associate editor coordinating the review of this manuscript and approving it for publication was Yudong Zhang .the majority of other data [3].These abnormal data points represent unusual behaviours and are denoted as outliers.A network outlier detection system (NODS) provides the mechanism to inspect network activities for detecting any possible intrusive actions [4].NODS can be installed in a host such as a computer to audit its activities, including system calls and log files for detecting inclusive events [5].Also, NODS can be deployed in a network to monitor and analyze its traffic stream behaviours to identify anomalous network connections [5].Furthermore, NODS can identify intrusion attempts using the signature, anomaly, or hybridbased detection approaches [6] The signature approach looks for the intrusion occurrence based on gathered knowledge about previous well-known intrusion signatures; therefore, it cannot identify the novel attacks [6].The anomaly approach looks for any deviation from regular behaviour activities of a system or a network; therefore, it can recognize novel attacks.The hybrid approach integrates anomaly and signature-based detection methods to deliver a robust detection capability embedded in a single approach [6].Regarding approaches used for detecting outliers, they are categorized as density, distance and machine learning or soft-computing [7].In this paper, SVM and GNB are implemented individually to develop the anomaly-based detection model of NODS which is built and evaluated on the labelled network traffic stream of the benchmark NSL-KDD and CICIDS2017 datasets [8], [9].Efficient data preprocessing of the network traffic data like features engineering is crucial in mitigating the model overfitting and boosting its generalization.Consequently, the outlier detection model performance gets improved and converged faster.The remainder of this paper is structured as follows: Section II reviews related work, Section III discusses the proposed NODS, Section IV highlights the implementation and results of the experiment.Finally, Section V presents the research conclusion along with the future interests.

II. RELATED WORKS
Network outliers are observations that are distinctly different from other observations, making them appear to be generated by a different process [10].Unlike noise, network outliers carry important information, which can inform proactive network threat management.For example, an unusually large number of requests coming from one computer could be an outlier generated by a different process, which could indicate a malicious attack or some other type of unusual activity [11].Thus, network outliers can help detect malicious behavior or provide insight into abnormal traffic patterns.
By detecting unusual activity in the network, organizations can identify malicious activities and reduce the risk of security breaches.Network anomaly detection can also be used to improve network performance by identifying and addressing network congestion, latency issues, and slow response times [11], [12].Li et al. [13] developed an optimized resource allocation and communication technique for the fault detection system.This method is vital considering the limited edge device computation capabilities, minimal communication resources, and varying monitoring accuracies.The proposed approach maximizes the system's processing performance, optimizes resource use, and meets all data transmission and analysis latency needs.
From an organization's perspective, verifying the integrity of the network ensures that legitimate traffic is not blocked or rerouted to unknown sources, leading to a more secure and reliable network [14].Pour et al. noted that by detecting anomalous activity, organizations can also ensure compliance with regulatory requirements, and improve the overall security posture of their network [15].Furthermore, network anomaly detection can be used to monitor suspicious activity and detect potential malicious actors who may be attempting to gain access to the organization's network or data [14].Thus, proactive network monitoring helps organizations to detect and respond to threats quickly, ensuring confidentiality, integrity, and availability of computing resources as well as preventing technical and business losses.
Lu et al. [16] address the issue of detecting the magnetic tile's internal defects leverages acoustic sound to detect the defects.The non-stationary and non-Gaussian properties of acoustic sound limit the accuracy of using a single data modality for detecting internal defects.Another study presents a novel ensemble and efficacious anomaly detection approach that relies on a collaborative representation-based detector.Background data is predicted using randomly chosen focused image pixels [17].Connected and Autonomous Vehicles (CAVs) are becoming increasingly common due to the current technological development rate.However, these cars' networks are highly susceptible to illegal eavesdropping.Therefore, we propose using Deep Reinforcement Learning (DRL) and Distributed Kalman Filtering (DKF) methods to mitigate jamming interference and increase communication robustness to eavesdropping.The overarching aim is to optimize security performance against smart jammers and eavesdroppers.Thus, we formulate a DKF algorithm that accurately tracks the attacker by sharing state estimates between nodes.Consequently, we conceptualize a design problem for managing transmission power and picking communication channels.These provisions are made while ascertaining that the authorized vehicle user's quality needs are not compromised.A hierarchical Deep Q-Network (DQN)-based architecture is selected since the jamming and eavesdropping model is dynamic and uncertain.The DQN architecture is employed for designing channel selection policies and anti-eavesdropping power control.The optimal power control model is rapidly performed first without prior data or insights on eavesdropping behaviors.The channel selection process, which is founded on the system secrecy rate analysis, then proceeds when necessary.We simulate the proposed system, finding that it increases the secrecy and attainable communication rates [18].
Connected and Autonomous Vehicles (CAVs) are becoming increasingly common due to the current technological development rate.However, these cars' networks are highly susceptible to illegal eavesdropping.Therefore, we propose using Deep Reinforcement Learning (DRL) and Distributed Kalman Filtering (DKF) methods to mitigate jamming interference and increase communication robustness to eavesdropping.The overarching aim is to optimize security performance against smart jammers and eavesdroppers.Thus, we formulate a DKF algorithm that accurately tracks the attacker by sharing state estimates between nodes.Consequently, we conceptualize a design problem for managing transmission power and picking communication channels.These provisions are made while ascertaining that the authorized vehicle user's quality needs are not compromised.A hierarchical Deep Q-Network (DQN)-based architecture is selected since the jamming and eavesdropping model is dynamic and uncertain.The DQN architecture is employed for designing channel selection policies and anti-eavesdropping power control.The optimal power control model is rapidly performed first without prior data or insights on eavesdropping behaviors.The channel selection process, which is founded on the system secrecy rate analysis, then proceeds when necessary.We simulate the proposed system, finding that it increases the secrecy and attainable communication rates [19].
Several practical challenges constrain the conventional ''forecast-response'' paradigm.For instance, the method's applicability is poor when different situations need dissimilar reaction processes.This deficiency originates from the paradigm's macro-perspective description of crises that overlooks the micro-perspective evaluation of emergency response.Therefore, this research recommends employing the ''scenario-response'' paradigm, which leverages a microscopic approach to frame the implications of conforming measures on events.Zhengzhou, China, experienced unexpected torrential rains in 2021 that resulted in 398 fatalities and approximately 120.6 billion RMB of economic losses.Consequently, an empirical assessment of the disaster based on Bayesian networks was done to analyze the emergency response's evolution.The constructed scenario Bayesian network was built by amalgamating Dempster's combination rule, scenario evolution, and knowledge meta-theory with 362 appropriate historical representative events.The network could also identify the progression of the respective emergency events and combine different experts' analyses.An event-driven Bayesian network was also employed to evaluate the impact of individual actions on the response outcomes' odds.The interventions' counterfactual outcomes were also checked using causal inference to highlight the urgent and vital responses.The similarity between each source and target scenario exceeded 0.7, with the highest value at 0.78.Furthermore, the incident response's evolutionary precision was examined by contrasting scenario parallels.Thus, the proposed approach can offer a theoretical foundation for deploying a ''scenario-response'' paradigm [20].
The number of multi objective large-scale optimization problems (MOLSOPs) has increased in recent years.The MOLSOPs can be addressed using cooperative coevolution and variable grouping optimization.However, few researchers have attempted to decompose MOLSOP variables.Therefore, they present a multi objective graph-based differential grouping with shift (mogDG-shift) for decomposing the multiple MOLSOP variables.We begin by assessing variable attributes and then detect the variable interactions.Consequently, we categorize the variables according to their interactions and features [21].
Asif et al. [22] developed an Intrusion Detection System (IDS), where KDD 99 intrusion dataset was used as the network traffic source.The detection system developed was designed to identify anomalous activities and network outliers early.Apache storm framework was used to handle the network stream big data characteristics.Assessment results stated the feasibility of the detection system.Besides, the system performance can be improved by solving the class imbalance problem.In [23] Kurniabudi et al. utilized the Information Gain to rank and group features based on minimum weight values, enabling the selection of relevant and significant features [27].Subsequently, we employ five classifier algorithms, namely Random Forest (RF), Bayes Net (BN), Random Tree (RT), Naive Bayes (NB), and J48, to conduct experiments on the CICIDS2017 dataset.The experimental results demonstrate that the number of relevant and significant features determined by Information Gain significantly impacts detection accuracy and execution time.Specifically, the Random Forest algorithm achieves the highest accuracy of 99.86% when using 22 relevant selected features, whereas the J48 classifier algorithm attains an accuracy of 99.87% with 52 relevant selected features, albeit requiring a longer execution time.
Pankaj Jairu et al. focused on building anomaly-based IDS to detect variety of network attacks by using many supervised learning algorithms such as Logistic Regression, Support Vector Machine (SVM), K-Nearest Neighbor (KNN), Naïve Bayes, Decision Tree, and Random Forest on multiple datasets, including the realistic evaluation dataset CICIDS-2017 [28].Results demonstrated that Random Forest outperformed other supervised algorithms and achieved an impressive accuracy of 99.93% by using only 14 features selected via Pearson's correlation coefficient method.
Shruti et al. introduced a novel intrusion detection system that employs ensemble techniques of machine learning algorithms [29].The objective is to enhance classification accuracy and reduce false positives, utilizing features sourced from the CICIDS-2017 dataset.The proposed presents an intrusion detection system (IDS) implemented through machine learning algorithms, including decision trees, random forests, and SVM.Additionally, this proposed incorporates LIME which is considered as an explainable framework to understand the model's prediction.The ensemble of ML models showed an improved accuracy of 96.25 for the IDS prediction, and the LIME explanation graphs showcased the prediction performance of the decision tree, random forest, and SVM algorithms.This integration aims to enhance comprehensibility and insight into the previously opaque black-box methodology for reliable intrusion detection.
Omar et.al have implemented five distinct deep learning models for the identification and categorization of suspicious activities within network flows in IOT environment [30].These models are initially trained on a cloud server and subsequently deployed to a gateway node, where the pivotal network traffic classification is executed.The entire process of model training and assessment is conducted utilizing the CICIDS2017 dataset.The evaluation of the five models' accuracy revealed that the proposed model, named EIDM, exhibited exceptional performance, surpassing the other four models with a remarkable accuracy rate of 99.48%.This superior performance was achieved while also taking into consideration the time resources expended.Furthermore, the EIDM model proved its efficacy by successfully categorizing the full spectrum of 15 traffic behaviors, which encompassed 14 diverse attack types within the CICIDS2017 dataset, achieving a commendable accuracy level of 95%.

III. NETWORK OUTLIER DETECTION SYSTEM (NODS)
There are two main categories of NODS; supervised and unsupervised.If a system utilizes both supervised and unsupervised features, it is classified as semi-supervised [31] Supervised NODSs use labeled data to train a model that can then be used to detect outliers in new, unlabeled data sets.These systems are based on supervised learning techniques, such as decision trees, neural networks, and support vector machines (SVMs) [32].These techniques are used to identify patterns in the data that indicate the presence of outliers.In decision tree-based NODSs, the data is split into multiple nodes based on the value of a certain feature [31], [33].The nodes are then classified as outliers or inliers.Then, the system uses the decision tree to evaluate the data points and identify outliers.
Unsupervised NODSs use only unlabeled data to identify outliers.In this case, the dataset is first divided into two or more clusters, where each cluster represents a set of data points that share similar characteristics [32], [34].The clusters are then evaluated to determine whether any data points are significantly different from the rest of the data.The evaluation is done using a variety of methods, such as density-based clustering, clustering based on distance, and cluster-based outlier detection algorithms.Once clusters are created, the next step is to identify anomalies in the data, which is achieved by calculating a score for each data point [22].Consequentially, the score is calculated based on a variety of factors, such as distance from the cluster's central point, variance from the cluster's mean, and correlation with other data points in the cluster.
Smiti noted that if a data point has a significantly higher score than the rest of the data, it is considered an outlier [20].Once outliers are identified, they can be further analyzed to determine what type of malicious activity is taking place [22].As a result, the analysis can be done manually, or by using automated tools such as machine learning algorithms.NODS is deployed and attached to the entry point device of a computer network, as shown in Figure 1.Its goal is to capture and analyze the incoming network flow of this network.NODS starts with capturing the network traffic stream data by a packet sniffer.Then the related network packets are gathered to form numbers of network connections and generate them into a dataset file to be analyzed [35], [36], [37].Each connection is described as a vector of many network features.Therefore, any network connection behaviour can be analyzed and classified as either normal or an outlier.Once NODS detects any abnormal network flow, an alarm is raised to the network admin to take suitable countermeasures regarding this outlier traffic, like dropping this anomalous traffic by blocking its IPs.However, processing these data directly represents long time analysis processes and leads to imprecise detection results.Therefore, it should be preprocessed well by many data mining techniques before being analyzed to ease the classification process and achieve efficient classification results.

A. NETWORK TRAFFIC DATA PRE-PROCESSING 1) NETWORK FEATURES ENCODING
The network features values are heterogeneous in their types where they can be founded either in nominal forms like protocol type, e.g.TCP or UDP, or in numeric form like a port number.Many outlier detection models cannot work with nominal data.It should be encoded into numeric form, and each connection's class/target feature is encoded to 0 for normal and 1 for the outlier/anomalous behaviour.
where x is the original feature value, and N(x) denotes its normalized value.b) Z-score method scales each feature according to its mean and standard deviation as the following formula 3) NETWORK FEATURES SELECTION The network connection is described as a vector of network features representing the connection behaviour.The information contribution of these features concerning the connection behaviour label is varied [38].Many features hold less information about the connection behaviour denoted by irrelevant features, while others contain redundant information denoted by redundant features.Building the detection model on either irrelevant or redundant features causes the overfitting problem rather than increasing the model complexity [39].Discarding those features during the model building process improves model classification capabilities [39].Two features selection techniques, PCA [40] and CFS [41] are adopted to select the dominant features from the whole network features set for building the detection model on its basis.PCA selects a subset of network features that has the higher eigenvalues.In contrast, CFS selects features with a high correlation with the class/label of the network connection behaviour and low or no correlation between each other.

B. NETWORK OUTLIER DETECTION 1) SVM MODELS FOR NODS
Support Vector Machines (SVMs) are a type of supervised learning algorithm that has been successfully applied to a variety of classification and regression problems.The SVM algorithm is based on the idea of finding a hyperplane that best separates the data points into two distinct classes.The SVM algorithm seeks to maximize the margin between the two classes, thereby obtaining a ''maximum-margin hyperplane'' [42].This hyperplane is determined through a process of optimization which minimizes the overall classification error.In SVM models, support vectors form the basis of the decision boundary which separates the two classes and has the maximum influence on the position of the hyperplane [42], [43].SVM models are applied in NODS because, in these systems, the goal is to identify ''outliers''-data points that are significantly different from the data points in the same class or cluster.Outliers can indicate malicious behavior, faulty or malfunctioning nodes, or other anomalies [44].To detect these outliers, it is necessary to use an algorithm that can distinguish between normal and abnormal data points.SVMs models are well-suited for this task because they are capable of finding non-linear boundaries between data points.
SVM is considered a good candidate for building the anomaly-based outlier classification model.It begins with learning the network traffic's normal/usual/inlier behaviour obtained from the previous preprocessing stage.After, it builds a model which can recognize both normal and abnormal behaviours of unseen network traffic.Each network connection differs from the usual behaviour/pattern treated as an outlier connection.

2) GAUSSIAN NAIVE BAYES (GNB) MODEL FOR NODS
Considered a popular supervised probabilistic algorithm model and based on Bayes' theorem.It is commonly used for text classification and is widely used in various machinelearning tasks, including spam filtering, intrusion detection, and sentiment analysis [37].
The key assumption in GNB is that all features are conditionally independent given the class label.In other words, it assumes that the presence or absence of a particular feature does not affect the presence or absence of other features in the same class.This is a strong and often unrealistic assumption, but it allows the algorithm to be computationally efficient and work well with high-dimensional data [45].
GNB is an effective choice for identifying anomalous network activities and potential security threats.By considering the statistical distribution of features related to network traffic, such as packet sizes, response times, and connection duration, the model can learn patterns of normal behavior.During the testing phase, it can efficiently classify incoming data as either normal or malicious based on the learned probability distributions [46].

3) TUNING SVM AND GNB CONTROL PARAMETERS BY USING GA
Radial Basis Function (RBF) SVMs are becoming increasingly popular for classification, regression, and clustering tasks such as network outlier detection.Wainer et al. noted that RBF technique is preferred due to its capability to map non-linear data, which allows them to capture complex patterns in the data [41].
SVM uses the RBF as a kernel function during the classification process.RBF has two parameters: the penalty (c) and kernel parameter (σ ).The former controls the SVM's hyperplane flexibility, while the latter controls the correlation among support vectors of the same hyperplane.These parameters have an observable impact on the SVM classification effectiveness.Thus, it's necessary to properly tune these parameters values which considered an optimization problem.
For the GNB, the primary parameter that can be adjusted is the smoothing parameter which is used to prevent zero probabilities when a particular feature value is not observed in the training data for a given class.The smoothing parameter is a positive value added to all feature occurrences, which helps in handling unseen feature combinations and avoids division by zero in probability calculations [47].
Genetic Algorithms (GAs) have become an increasingly popular tool for optimizing complex systems, including NODS.GAs have been shown to outperform traditional optimization techniques in a variety of applications, from distributed systems to clustering algorithms.GAs also provide efficient and robust solutions for outlier detection, with applications in network intrusion detection, fraud detection, and traffic anomaly detection [48].Notably, traditional methods of NODS rely on static rules and thresholds, which can be difficult to maintain and may not always be accurate.
GAs offer an alternative approach to NODS, providing a more dynamic and adaptive solution.The basic idea behind GAs is to use evolutionary algorithms to search for the best solutions to a given problem.In the case of network outlier detection, this means using GAs to optimize the parameters and thresholds used to detect outliers [49].GAs are able to search through a large and complex search space to identify the best parameters for a given problem.In this research, GA employed to search for the best values of RBF parameters in this research, GA is employed to search for the best values of SVM's RBF and GNB's smoothing parameters in a given search space which consists of number of candidates each representing possible values for these parameters.Determining the appropriate candidate will boost SVM and GNB detection performance.Further theoretical and technical details on SVM, GNB and GA techniques are discussed in [45], [50], [51], and [52].

A. NETWORK INTRUSION DATASET
1) NSL-KDD is a benchmark labelled network traffic dataset used globally by researchers who are interested in intrusion detection field area [53].It consists of two files, the training set with 127973 network connection instances and the testing set with 22544.Each connection described by a vector of 42 features as mentioned in Table 1.For the feature value types, all are considered as numeric except feature numbers (2,3,4,42) are nominal, as shown in Table 1.The behaviour of each connection is classified as either normal or outlier.
It has 38 varied attack types, where the training set contains 22 types, and the testing set involves the other 16 [39].Table 3 groups these attacks into four categories as following: 1. Probe: Intruder aims to obtain varied information concerning the victim host or network by scanning its opened and closed ports, rather than its IPs ranges to launch future attacks.2. Denial of Service: By using zombies, intruders can flood the target system with huge numbers of network packets.As a sequence, the victim system resources e.g.network bandwidth, and processing power are exhausted and become unreachable for its legitimated users.3. User to Root: Intruder aims to acquire the root/admin privileges of the victim machine by exploring and exploiting their vulnerabilities.1) Remote to Local: Intruder who has no account on the host aims to get unauthorized access to it.2) CICIDS2017 is a benchmark dataset widely used in the field of intrusion detection research [54].It was created to evaluate the performance of IDS in accurately identifying network attacks and distinguishing them from legitimate network activities.Most of the available network traffic datasets suffer from the absence of traffic diversity, volumes, anonymized packet information payload, constraints on the attacks range, the lack of the feature set and metadata.Therefore, this dataset came to conquer these concerns.It comprises various types of network traffic, including benign/normal traffic and different categories of attacks including Brute Force attack, Web attack, DoS, Infiltration, Botnet, PortScan and DDoS.It consists of 2830540 connection instances where each is described by a vector of 79 features as mentioned in Table 4.All network traffic flow classes categorization of the CICIDS2017 dataset are listed in Table 5, where all detailed analysis of the CICIDS2017 dataset is existed at [55].

B. EXPERIMENTAL SETUP
A personal laptop is used to carry the proposed research experiments with 4 GB RAM, Intel core i7 CPU, and Window 10 OS.The setup of these experiments was as follow: • Min-max and Z-score scaler/normalizer techniques are implemented in Python to normalize and rescale the input feature values of network traffic data.
• The Java-based weka platform is used to implement the features selection process from network traffic data by two filter techniques PCA and CFS.• The Python-based Scikit-learn machine learning library is employed for implementing and building the SVM and GNB detection models individually on the network traffic data of the NSL-KDD and CICIDS2017 datasets and adopts the superiority of them as the detection model for the proposed NODS.
• GA is implemented in Python to adjust and tune RBF control parameters by using SVM and the smoothing parameter of the GNB models.The model detection accuracy is used as the GA fitness function for evaluating each candidate/individual/chromosome fitness during the GA generation process.
• The number of GA iterations was 100, and the size of the GA population was 300 candidates.Each GA candidate consists of either two random values for SVM RBF [penalty parameter (c), kernel parameter (σ )] or one random value for the GNB's smoothing parameter.
• For the NODS implementation, 125973 and 22543 instances from NSL-KDD are used for the training and testing steps, while 120023 and 30006 instances are used from the CICIDS2017, respectively.
• The overall performance of the SVM and GNB detection models is evaluated individually on the NSL-KDD and CICIDS2017 datasets by many evaluation metrics as discussed in the next subsection.

C. PERFORMANCE EVALUATION METRICS
Many metrics are calculated to evaluate the capabilities of the proposed NODS.These metrics are inferred from the following confusion matrix:

D. EXPERIMENT SCENARIOS AND RESULTS DISCUSSION
Proposed experiments are conducted by carrying out four scenarios developing and optimizing the proposed The first scenario mimics building the detection tem on the original network traffic data of the pre-mentioned dataset without performing any data preprocessing stages.The second scenario mimics performing only one data preprocessing stage by normalizing the network traffic data by min-max [-1:+1], and z-score scaler methods before building the detection system.The third scenario mimics applying two data preprocessing stages before building the detection system.
After normalizing the input network traffic data by the best scaler approach determined from the previous scenario, we apply the dimensionality reduction process on the input normalized data by selecting the most informative and significant features subset from the whole features set.Two filter feature selection techniques, the PCA and IG, are applied individually on the input normalized network data before the learning process to detect which selection technique affect positively the NODS detection performance.Finally, the fourth scenario mimics employing GA to tune the hyperparameters of the SVM's RBF control parameters [c, σ ] and the smoothing parameter of the GNB during the building process of the used detection model on the pre-selected network features subset obtained from the previous scenario and analyze their impact on the final performance of the proposed NODS.For the GA setup, we noticed that using large individuals/candidates' numbers of the GA population resulted in providing better genetic variability and a faster adaptation as well.And based on many pre-empirical experimental tests and trials, we set the number of individuals in the GA population to 300, and the generations number to 100.
Concerning the first scenario, the SVM and GNB detection models performance built on both the NSL-KDD and CICIDS2017 datasets are ineffective totally according to their evaluation results shown in Table 6 and 7. Due to the low quality and non-preprocessing of the input network data, the detection model got a high underfitting.Therefore, both detection models' accuracy and detection rates in recognizing the network traffic were very low, and they required a long time for classifying the traffic behaviour.As a result, the admin will be confused about the high false rates much intrusive traffics are recognized as normal.8,9, and figures 2,3, both SVM and GNB detection models performance after applying the min-max [−1:1], and z-score methods were better than the performance of the first scenario detection model.Results ensure the importance of applying the normalization task during the data preprocessing stage before the learning process starts.Regarding the impact of the two normalization approaches used for enhancing the SVM and GNB detection models performance, the impact of applying z-score outperformed the min-max [-1:1] scaler method on the models built on the network traffic data of the NSL-KDD dataset where the vice versa on the CICIDS2017 dataset.So, applying the normalization task helps in overcoming the model biasing and underfitting problems and therefore optimizing the NODS capabilities to be more effective and faster.In addition, the detection model misclassifying rates represented in either the false negative or positive alarms became much lower than the first scenario results.Regarding the third scenario, applying the dimensionality reduction task on the best-normalized network traffic features from both NSL-KDD and CICIDS2017 data resulted from the previous scenario.Two common feature selection techniques, PCA and CFS, are applied individually on the normalized data before the SVM and GNB detection models learning process, to assess their impact on the overall detection capabilities of the used models.
The selected feature subsets from both the zscore-based NSL-KDD and min-max [-1:1] based CICIDS2017 are tabulated with their indices in Table 10,11.Both SVM and GNB detection models are built on these selected feature subsets and their evaluation performance is evaluated.Results in Table 12,13 stated that the PCA technique outperformed CFS in selecting the most relevant and informative features from both the used two datasets.Consequently, it led for achieving a significant contribution in decreasing the SVM and GNB detection models learning time, complexity, and mitigating the overfitting risk.Furthermore, accelerating the detection models time, and improving their effectiveness in analyzing the input network traffic behaviours compared with the second scenario results.Regarding the fourth scenario, the GA is used to tune the RBF control parameters [c, σ ] of the SVM and the smoothing parameter of the GNB during their learning process on the previous PCA-based selected network features of the used datasets from the last scenario.Results in Table 14,15 stated that adjusting the two detection models hyperparameters resulted in boosting their generalization ability and convergence speed which led to an optimization in the overall performance of the SVM and GNB models.
Regarding the evaluation comparison between the four successive scenarios, it's noted that the fourth detection NODS models (PCA-GA-SVM and PCA-GA-GNB) considered the superlative among all previous NODS scenarios in detecting the normality and abnormality behaviours of the network traffic connections of the used datasets.
For a comparison with other related detection systems as shown in Table 16, evaluation results stated the superiority

V. CONCLUSION
An outlier detection system is proposed to identify the normal and abnormal network traffic.The SVM and GNB classification algorithm are employed to classify the behaviours of incoming network connections that affect a network of computers.They are built and evaluated on the NSL-KDD and CICIDS2017 network traffic datasets.Data mining preprocessing stages for network flow data, besides tuning the SVM's RBF control parameters and GNB's smoothing parameter, were vital for improving the inclusive effectiveness of the proposed NODS.The performance of the proposed system is compared with other related IDSs and the evaluation results stated the superiority of the proposed SVM-NODS in detecting the different intrusions.In our future work, we will explore and implement other strategies for boosting the detection system capabilities and also investigate many deep learning trend models in building the proposed detection model.
, Han et al. developed an IDS to identify varied network attack types.Evolutionary neural networks (ENNs) were used to construct the detection model on the network traffic of the DARPA IDEVAL dataset.Evaluation results showed the system's ability in detecting network intrusion with low false alarms and a high detection rate.In [24], Wang et al. developed an IDS to complement the firewall.It can identify network attacks that the firewall cannot detect.The IDS was built based on the K-means clustering-based density and the k-NN classifier on the KDD intrusion dataset.Results proved that the system is effective in detecting varied network attacks.In [25], Sanjay et al. presented an improving mechanism for the attack detection system based on streaming data mining approaches.NSL-KDD intrusion dataset was used to assess four classification techniques, and their evaluation results are compared.Results proved that the Naïve Bayes classifier achieved the best accuracy, and the Hoeffding tree achieved the least detection time.In [26], Zhang et al. developed an outlier detection technique for data streams.The detection model is trained and assessed on KDD dataset.The performance evaluation proved the system's effectiveness in detecting network outliers at a lower rate of false positives than other compared systems.

2 )
NETWORK FEATURES NORMALIZATION Naturally, the values range of network features is varied, leading the outlier detection model for biasing toward the high scale features and ignoring others with a lesser scale.This results in an inaccurate detection process, which could lead to the model underfitting problem.Therefore, this problem is avoided by rescaling the values of the feature ranges on a uniform scale.Two normalization methods are used, the min-max and the Z-score.a) Min-Max method scales each feature values between specific range of values [a,b] like [0 1] or [−1,+1] by the following formula

5 .
Detection time (DT): represents the time taken to classify the behaviours of all unseen network connections existed in the testing file of the dataset.6. Area Under the Curve (AUC): measures the NODS performance in identifying the normal and outlier classes.

For
the second scenario, applying the min-max [−1:1] and z-score normalization techniques as a data preprocessing task to rescale and normalize the input network features values before the detection model training process.It helps in preventing the biasing problem occurrence to the detection model toward the network features with high scale values where this problem always affects negatively the model performance.As shown in Table

FIGURE 2 .
FIGURE 2. NODS performance on the second scenario for NSL-KDD using the Min-max, and Z-score.

FIGURE 3 .
FIGURE 3. NODS performance on the second scenario for CICIDS2017 using the Min-max, and Z-score.

TABLE 3 .
All 38 attack types with four classes of NSL-KDD dataset.

TABLE 5 .
Network traffic class composition of the CICIDS2017 dataset.

TABLE 6 .
The NODS performance evaluation of the first scenario on NSL-KDD dataset.

TABLE 7 .
The proposed NODS performance evaluation of the first scenario on CICIDS2017 dataset.

TABLE 8 .
The proposed NODS performance evaluation of the second scenario on NSL-KDD dataset.

TABLE 9 .
The proposed NODS performance evaluation of the second scenario on CICIDS2017 dataset.

TABLE 10 .
Selected features subset by the CFS, and PCA techniques.

TABLE 11 .
Selected CICIDS2017's features subset by the CFS, and PCA techniques.

TABLE 12 .
The proposed NODS performance evaluation of the third scenario on NSL-KDD dataset.

TABLE 13 .
The proposed NODS performance evaluation of the third scenario on CICIDS2017 dataset.

TABLE 14 .
The proposed NODS performance evaluation of the fourth scenario on NSL-KDD dataset.

TABLE 15 .
The proposed NODS performance evaluation of the fourth scenario on CICIDS2017 dataset. of our proposed system with lower false alarms and higher detection accuracy.

TABLE 16 .
The Proposed NODS evaluation performance comparison with other related work.