An Innovative Perceptual Pigeon Galvanized Optimization (PPGO) Based Likelihood Naïve Bayes (LNB) Classification Approach for Network Intrusion Detection System

Intrusion detection and classification have gained significant attention recently due to the increased utilization of networks. For this purpose, there are different types of Network Intrusion Detection System (NIDS) approaches developed in the conventional works, which mainly focus on identifying the intrusions from the datasets with the help of classification techniques. Still, it is limited by the significant problems of inefficiency in handling large dimensional datasets, high computational complexity, false detection, and more time consumption for training the models. To solve these problems, this research intends to develop an innovative clustering-based classification methodology to precisely detect intrusions from the different types of IDS datasets. Here, the most recent and extensively used IDS datasets such as NSL-KDD, CICIDS, and Bot-IoT have been employed for detecting intrusions. Data preprocessing has been performed to normalize the dataset to eliminate irrelevant attributes and organize the features. Then, the data separation is applied by forming the clusters by using an intelligent Anticipated Distance-based Clustering (ADC) incorporated with the Density-Based Spatial clustering of applications with noise (DBScan) algorithm. It helps to find the distance and density measures for grouping the attributes into the clusters, which increases the efficiency of classification. Here, the most suitable optimal parameters are selected using the Perpetual Pigeon Galvanized Optimization (PPGO) technique. The extracted features are used for training and testing the dataset samples.Consequently, the Likelihood Naïve Bayes (LNB) classification approach is implemented to accurately predict the classified label as to whether normal or attack. During the evaluation, the performance of the proposed IDS framework is validated and compared using various evaluation metrics. Theresults show that the proposed ADC-DBScan-LNB model outperforms the other techniques with improved performance outcomes.


I. INTRODUCTION
The internet plays an essential role in our daily part of life, a resource used for learning information in different fields such as education, business, and others. Specifically, most organisations could use the Internet as the technology for accomplishing their management activities [1,2]. They can utilise the Internet application for profit growth and keep confidential/private information secret. Also, it is used to establish good communication between the customers of the organization. In addition to that, it supports the organisations to improve the operating efficiency against the network vulnerabilities [3]. In this platform, the data processing and execution are moderately dangerous due to the frequent assaults on the internet. Hence, it is essential to ensure increased data anonymity and improve public interest in the security [4,5]. Typically, the attacks are the kind of unwanted/malicious actions performed by the attackers [6] for degrading or affecting the networking system, which is easily detected with the help of anti-attacking systems. To predict harmful attacks, many security approaches are used today. These techniques are employed to detect the attacking activities in the network based on the signature of patterns. In many conventional works, the Network Intrusion Detection System (NIDS) [7]framework has been deployed to detect intrusions based on certain dataset features, such as traffic patterns, the flow of data, and packet information. Still, it faces some difficulties related to the factors of signature dependency, single point of failure, requires an increased amount of time for training the models, and complexity in algorithm design. Hence, various intrusion detection approaches such as clustering, optimization, and classification Field [5,8,9] have been utilized in the existing works to identify and classify the intrusions based on their attributes/features. Typically, the main reason for using clustering approaches Field [10] is to group the attributes in the cluster based on the distance value. Most of the clustering techniques are developed based on the parameters of density and distance, which helps to separate the data. It includes the types [11,12]of hierarchical clustering, k-means clustering, partition based clustering, spatial clustering, and centroid based clustering. Consequently, the optimization methodologies select the most suitable attributes from the given datasets by computing the optimal fitness value. Generally, the increased number of attributes can degrade the classifier's performance in terms of misclassified labels, high false positives, and more time-consuming training the data samples. Recently, the meta-heuristic optimization techniques [13] have been widely applied to solve classification problems, which identifies the best fitness value based on the weight value. It includes the types [14,15] of Genetic Algorithm (GA), Particle Swarm Optimization (PSO), Ant Colony Optimization (ACO), Ant Bee Colony (ABC), Whale Optimization (WO), and Firefly (FF) optimization. Moreover, machine learning and deep learning classification techniques are used to predict the classified labels for the given problems. The classifier's performance highly depends on the optimal number of features used for training the model. There are different types of classification techniques [16]are used for detecting the intrusions from the IDS datasets, which includes [13,17] Neural Network (NN), Naïve Bayes (NB), Decision Tree (DT), Support Vector Machine (SVM), Relevance Vector Machine (RVM), and Fuzzy Logic (FL). However, the significantlimitations of the conventional approaches are as follows: single point of failure, highly dependent on signatures, increased computational cost, high false alarm rate, and difficulty in detection. Hence, this research intends to develop an intelligent IDS using enhanced clustering, optimization, and classification techniques. The main contribution of this paper is to precisely predict the intrusions from the given network IDS datasets by implementing an advanced optimization based classification methodology. In this system, the three different and popular IDS datasets, such as NSL-KDD, CICIDS, and Bot-IoT, have been used to validate the proposed system's performance.
Here, a group of methodologies such as clustering, optimization and classification are utilised for processing the datasets in order, and classification is utilized to process the datasets to predict the intrusions. Typically, data clustering is one of the most suitable techniques used for simplifying the detection process because which helps to group the attributes into the form of a cluster. Here, the distance-based clustering mechanism is utilized for constructing the group of clusters concerning the parameters of cutoff distance, lower and higher density values. Also, the clustering helps to preprocess the datasets for filling the missing values, eliminating the irrelevant and redundant attributes. Specifically, the classifier's performance highly depends on the quality of data and features used for training the model. Hence, it is essential to preprocess the datasets before using them for further processing. After that, a novel Perceptual Pigeon Galvanized Optimization (PPGO) technique is employed for optimally selecting the features from the clustered data. The primary purpose of using this technique is to reduce the classifier's computational complexity and time consumption. Generally, the classification techniques consume more time for training and testing the data attributes to predict the classified label, so the increased amount of features can degrade the classifier's performance with high misclassification outputs, error rate, and increased false positives. Due to these factors, the proposed work objects to utilize an optimization technique for selecting the most suited attributes based on the global best solution, which also helps to obtain an increased detection accuracy. Finally, the set optimal number of features are fed to the classifier for training the models to predict whether the data is normal or intrusion. For this purpose, an enhanced Likelihood Naïve Bayes (LNB) based machine learning classification approach is utilized in this system, accurately identifying the intrusions from the given datasets with increased accuracy and reduced complexity.
 Algorithm I -ADC with DBScan Clustering.  Algorithm II -Perpetual Pigeon Galvanized Optimization (PPGO)  Algorithm III -Likelihood Naïve Bayes (LNB) Classification The primary objectives behind this work are as follows:  Anticipated Distance-based Clustering (ADC) incorporated with the Density-Based Spatial clustering of applications with noise (DBScan) technique is deployed to form the cluster for separating the dataset.  An efficient Likelihood Naïve Bayes (LNB) classification technique is employed to improve the detection accuracy and predicted outcomes.  The Perpetual Pigeon Galvanized Optimization to select the optimal feature such as energy, probability based on density, likelihood, trust, energy, weight function, IP, traffic stamp, port, and average total number of packets (PPGO) technique is utilized.  The most widely used IDS datasets such as NSL-KDD, CICIDS, and Bot-IoT are utilized to test this system.  To evaluate the performance of this system, there are various measures such as accuracy, precision, FPR, TPR, F1-score, and similarity coefficients are estimated. The remaining portions of this paper are structuralized as follows: Section II reviews the existing NIDS approaches used to improvenetworking systems' security, where each technique's advantages and disadvantages are discussed based on its key features and operating functions. The overall description of the proposed methodology is represented with its clear flow and algorithmic illustrations in Section III. The performance and comparative analysis of both existing and proposed IDS are validated and compared using different datasets in Section IV. Finally, the overall paper is summarized with its obtainment and future scope in Section V.

II.RELATED WORKS
This section reviews some of the conventional works related to the NIDS with its benefits and limitations. The machine learning classification techniques, including supervised and unsupervised learning models, are highly used in many security applications. Which some of the recent methods are studied as shown below:  Decision Tree: In machine learning, a classification tree is also known as a prediction model or decision tree [18]. It is a graph in a tree-like manner with internal nodes that indicates the test properties, branches, and terminal nodes or leaves that reflect the class that belongs to any object. The classification tree algorithms ID3 and C4.5 are thefundamental and extensively utilized. The two techniques in the building of the tree are top-down tree structure and bottom-up pruning. ID3 and C4.5 reflect the top-down tree structure. More techniques to classify trees were found to be more exact than categorising ship bays than decision-making trees.
 Fuzzy Logic: It is based on fuzzy set theories [19], which deal with reasoning that is not precisely inferred but approaches traditional predicate logic. The fuzzy set theory coverswell-thought-out real-world expert values for a complex topic. The information in this approach is classified based on several statistical measures. These data parts are utilized to classify them as usual or as malicious with logical standards that have been broken. Several intrusion data extraction techniques expound specific changes in current data mining algorithms to enhance efficiency and precise extraction of intrusion detection patterns.
 Naïve Bayes Network: There are various instances in which there are statistical dependencies or causal interactions among system variables. The probabilistic interactions between these variables can be difficult to define accurately. In other words, the system's previous knowledge is just that others could change some variable. A probabilistic graphical model known as the Naive Bayesian Networks (NB) Field [20] might be used to take advantage of this structural link between the problem's random variables. This model responds to questions such as, "What is the chance of a certain sort of assault if a few observed incidents are presented?" The conditional probability formula might be utilized. The NB structure is often represented as a DAG, with each node representing one system variable and each link coding the influence of one node over another. A DAG usually represents the NB. While the accuracy of the decision tree is much superior, the computational time of the Bayesian network is minimal when comparing the decision tree and Bayesian techniques. Therefore, it is efficient to utilize NB models if the data set is huge.
 Genetic Algorithm: In the subject of computational biology, it was first introduced. These algorithms are part of the broader evolutionary algorithm class (EA). Convolution techniques, such as heritage, selection, mutation, and crossing, find answers to optimization issues. In several fields they have been utilized with highly promising results. For intrusion detection, the genetic algorithm (GA) [21]produces the audit data from a set of categorizationrules. The support and trust framework is utilized as a fitness function to determine the quality of each rule. Significant features include GA's noise and self-learning strength. The advantages of GA methods include high attack detection rates and few false positives.
 Neural Networks: It is a network of linked nodes that replicate the brain's functioning. Each node has an extensive connection to several other nodes in neighbouring layers. Individual nodes can use the weights and a simple function [22] to determine the output values from the linked nodes. Neural networks can be established for supervised or unsupervised learning. The user must specify the number of hidden levels and the number of nodes in a hidden layer. The neural network output layer may include one or more nodes depending on the application. The neural networks [23] of the Multilayer Perceptions (MLP) have been successful in many applications and have produced more precise results. They can approximate any continuous function to the random VOLUME XX, 2017 9 precision provided they contain sufficient hidden units. Therefore, such models potentially establish every decision boundary for classification within the feature space and behave as a non-linear discriminatory function.  Support Vector Machine: Various supervised learning approaches for classification and regression. The Support Vector Machine (SVM) Field [24] is frequently employed in the pattern recognition business. The invasions are also recognized. The SVM one class is based on several instances of a particular class and does not employ adverse and favorable examples. SVM exceeded NN in terms of false alarm rates and accuracy in most types of assaults compared to neural networks in the KDD cup data set. A novel tool for detecting regularities and irregularities in big datasets has already been implemented in the network security environment. Hybrid learning approaches can attain the greatest possible accuracy and detection rate. A combination of clustering and classification techniques might be employed to develop a hybrid training method. Clustering is an anomaly-based approach to detection, able without prealerting to identify novel threats and recognize natural data groups based on common patterns. It might also be used to quickly uncover a set of similar traffic behaviors using cluster analyses, such as k-means and Db Scan. The classification of Naive Bayes is more efficient and can produce very competitive results in anomaly-based network infiltration because of its fundamental structure.
In normal and consistent distribution circumstances, the effectiveness of k-means and k-medoids clustering algorithms was examined using vast data sets. The average time taken for k-means is higher than the average time used for the two situations by k-medoids. A methodology was created and suggested by [25] that include a three-story rating of a decision tree for boosting the detection rate. This approach detects known assaults more effectively, but its poor detectability rates for new attacks and high false alarms constitute a serious shortcoming. The author in [26]has suggested an IDS model combined with the DT-SVM (the decision tree), which provides a high detectable rate while reducing special attacks from standardbehavior [27,28]. The system uses a hierarchical intellectual hybrid system incorporating the decision tree.
The ADAM is an intrusion detector designed to identify intrusions using data mining techniques (Audit Data Analytic and Mining). An Intrusion Detection using Data Mining (IDDM) is the real-time NIDS for abuse and anomaly detection, which is used as the Data Mining Technique (DMM) [29]. It applies laws of association, Meta rules and rules of character. Data mining is used to describe the network data and to analyse deviations using this information.Authors in [30] offer a method of detecting intruders with an expanding neural fuzzy network. This learning method integrates the artificial neural network (ANN) with FIS systems and evolutionary algorithms. They develop an algorithm using fluffy rules and allow the creation of new neurons. They employ Snort to collect algorithm data and then compare their technology to an enlarged neural network.
Author in [31]develops anomaly detection statistical neural network classifiers torecognize UDP flood attacks. The backpropagation neural network (BPN) was demonstrated to be more efficient in developing IDS compared with various neural network classifiers. It employs the background multiplication method for intrusion detection by the sample and attribute queries to analyze and determine the essential training data components. It can reduce the time of processing, storage, etc. The Bayesian rule of conditional probability was written by a well-known article, which shows that the base-rate failure of intrusion detection is involved.
Clustering is an anomaly-based approach of detection that is capable without prior notification to detect new attacks and to identify natural data groups based on pattern similitudes. In K-Means, DBScan, and others use cluster analysis to identify a set of traffic behaviours. The Naïve Bayes classifiers provide even this classifier with a simple structure for its experimental investigation with a very competitive result. According to the author, the classification task of Naïve Bayes is more effective.It shows that Naïve Bayes classify network intrusion more efficiently than neural network detection.
Genge et al [32]suggested an innovative approach for resilient distributed intrusion detection systems. The framework controls the outcomes of the risk assessment method to recognizeand rank serious communications flows. This kind of flows are incorporated in the issues related to the optimization, which lessen the organized detection devices when applying an algorithm of a shortest-path routing to reduce the delay in the communication. This work elaborated theresilient dispersed intrusion finding design algorithm that can detect the devices can fail or cooperate. This algorithm accurately positioned the detection devices to confirm whether the infrastructure was strong for most of the K communications path letdowns. The outcomes from the experiments demonstrated the distributed intrusion detection design framework effectiveness.
Ravale et al [33]designed a hybrid method which was combined the data mining approaches such as K Means clustering algorithm with the classification module of the RBF kernel function of Support Vector Machine. The main motive of this work was to lessen the quantity of attributes that are associated with every point of data. This work proved the overwhelmed performance in terms of accuracy and detection rate when carried out on the KDDCUP'99 Data Set.Gupta et al [34]performed an intrusion detection by ant colony gives good classification by using the NSL-KDD data set. It does not contain redundant records. The NSL-KDD dataset comprises two parts are 1) average establishment 2) termination. The figure shows a flow chart for intrusion detection technique by ant colony optimization.NSL KDD comprises three kinds 1) Fundamental individual connection VOLUME XX, 2017 9 features, 2) connection content features 3) traffic features. Table 1 investigates some machine learning classification techniques used to develop the IDS framework.  Decision Tree A classification tree is also known as a prediction model or decision tree, which produces the classified results for the given problems by taking decisions using a tree structure. [40,41]

Fuzzy Logic
Fuzzy set theory is used to cover well-thought-out realworld expert values for a difficult topic. The IDS framework is extensively used for anomaly prediction and classification based on rule formation. [42][43][44] Naïve Bayes The NB is usually represented by a DAG format, which has the ability to handle large dimensional datasets with reduced time consumption. [45,46] Genetic Algorithm Many IDS applications are mainly used to produce the audit data from a set of categorization rules by computing the fitness function with the crossover, selection, and mutation operations.

Neural Networks
It is a type of network constructed with a set of nodes that replicates the brain's functioning. Here, each node has extensive connections to several other nodes in neighboring layers. [50][51][52] VOLUME XX, 2017 9

Support Vector Machine
It is a kind of supervised learning technique and extensively used in many multi-class prediction models due to its increased accuracy and efficiency. [24,53] In the proposed NIDS frameworkdistance-based clustering and classification methods are used for detecting and classifying the types of intrusions from the IDS datasets. For data normalization, the attribute normalization is performed based on its minimum and maximum values along with the real value. For clustering, the Gaussian parameter, and a number of clusters are considered for grouping the data attributes into clusters based on the distance value. Moreover, the number of pigeons, compass factor, and probability function are computed to improve the classifier's detection accuracy.

III. PROPOSED WORK
This section presents a detailed description of the proposed Anticipated Distance-based Clustering (ADC) incorporated with the DBScan mechanism for accurately identifying the intrusions from the given datasets. The main contribution of this work is to precisely predict the normal and attacking labels from the given IDS datasets by using intelligent distance-based clustering and classification methodologies. The proposed attack detection framework objects to classify the types of attacks based on its attribute feature vectors with the likelihood function. Here, the ADC-DBScan clustering methodology is implemented to organize the attributes of the normalized dataset by computing the distance value. During data clustering, the preprocessed data is segregated into different chunks and, the attributes of each data unit are extracted based on the minimum distance value. Then, the novel Perpetual Pigeon Galvanized Optimization (PPGO) technique is employed for optimally selecting the attributes based on the global best fitness function. After that, the selected number of features is given to the classifier for the training models. The probability of selected attributes is computed for predicting the normal and attacking labels. For this purpose, the LNB classifier is employed in this work, which uses the training samples as the input and produces the classified label as the output according to the probability and likelihood functions.The novelty of this work is, it identifies the intrusions based on the optimal features by partitioning the normalized dataset into different chunks concerning the minimum distance value. Here, the deviation occurrence has been computed with respect to IP, port, timestamp and type of traffic. The overall flow of the proposed system is shown in Figure 2, which includes the following working modules:  Dataset obtainment  Preprocessing  Clustering  Optimization  Intrusion Detection and Classification In this work, the recent IDS datasets such as NSL-KDD, CICIDS and BotIoT have been utilized for intrusion detection and classification. Initially, data preprocessing is performed for obtaining the normalized data by eliminating the special characters and blank spaces in the raw datasets. Typically, identifying the intrusions or attacks from the large dimensional datasets requires increased time consumption for processing, leading to the system's increased complexity. Hence, the ADC incorporated with the DBScan data clustering technique has been applied to group the data into the form of clusters, where the distance is estimated for grouping the attributes. The PPGO technique is employed for selecting the best suitable features concerning the fitness function based on the likelihood function concerning the weight value of particles. After that, the machine learning classifier named as Likelihood Naïve Bayes (LNB) mechanism is employed for classifying whether the data is normal or intrusion. Then, the proposed intrusion detection and classification system were implemented in two phases of work. The deviation is computed for both regular and unauthenticated users while accessing the cloud applications. Based on this value, the cloud administrator gets alerted during VOLUME XX, 2017 9 unauthenticated data access from the cloud. The key benefits of using the proposed ADC-DBScan-LNB based intrusion detection and classification system are increased detection accuracy, ensured prediction outcomes, minimal time consumption for training and testing models, and reduced computational complexity.

A. Data Preprocessing
Typically, data preprocessing/normalization is one of the essentialprocesses of attacking detection and classification systems. The network intrusion datasets are generally large in dimension, and huge data with noisy contents can degrade the system performance with misclassification results, increased false positives, and high time consumption for training the models. Hence, it is essential to preprocess the dataset before using it for attack/intrusion detection. In this work, the recent IDS datasets such as NSL-KDD, CICIDS, and Bot-IoT have been utilized for intrusion detection and classification. At first, the raw datasets are preprocessed to eliminate unwanted attributes in the dataset, which helps to improve the efficiency and accuracy of intrusion detection. It involves the processes of replacing missing values, removing irrelevant attribute information, and arranging attributes. Data pre-processing is defined as the extraction of best records from many records by criteria that all attacks are in equilibrium. Here, the normalization is performed by finding the minimum and maximum values of data attributes for transforming it to the range of 0 to 1 as shown in below: Where, the data attribute is set as 0 (max = min ), and indicates the real value of attribute. Typically, the data preprocessing or normalization is more essential for improving the anomaly detection process. Also, the overall performance of the intrusion detection and classification system is highly depends on the normalized dataset without noisy contents. Finally, the normalized data attributes are given to the clustering scheme for grouping the similar data items based on the distance value.

B. ADC based DBScan Clustering
Clustering is one of the essential processes and plays a vital role in accurately detecting intrusions on the dataset. The group of information helps to improve the overall system performance. For this purpose, different types of distance and density-based clustering techniques are employed in the conventional works, which intends to strengthen group the attributes for identifying the intrusions that exist in the dataset. But, its significant limitations are the inability to handle large datasets, inefficient data separation, and high time complexity. To solve these problems, this work intends to develop a new clustering technique by incorporating the functionalities of both the ADC and DBScan clustering approaches. It helps to improve the overall efficiency and detection accuracy of intrusion detection and classification. Also, it reduces the computational complexity of classification by efficiently grouping the data based probability distance measure. Based on the feature attributes of the known attacks, the frequent attacks have been identified and detected based on its probability value. The mean of entire cluster has been calculated concerning the threshold value, if the mean value is beyond the threshold value, an automatic alert message has been generated to the administrator that helps to find the illegal access of the network with the IP address.
The DBcan is a density-based clustering technique that uses the minimum number of points and density of neighbourhood pixels. It forms the new cluster based on the neighbourhood points concerning the radius, and the main reason for using this technique is that it efficiently reduces the average time complexity. Also, it utilizes the global density parameters for identifying the clusters with different shapes and densities. The conventional density-based clustering techniques highly depend on the single value parameters, leading to reduced clustering efficiency. But, the proposed DBSCan technique can fine-tune different values of parameters in every group of the cluster. Here, the local density function is computed at each point based on the approximation of otheverall density function, which is estimated with respect to the sum of all functions. In this model, the neighbourhood information of the data points in a sparse region is dynamically captured for clustering.

Algorithm I -ADC with DBScan Clustering
Input: Preprocessed dataset = { 1 , 2 , … }, number of clusters , and Gaussian parameter ; Output: Clustered group data 1 , 2 … ; Step 1: Estimate the distance value , between the data of and by using the following equation: Step 2: Compute the cutoff distance ; Step 3: Then, the local density function is computed with respect to the cutoff distance value as shown in below: Step 4: After that, the distance is estimated for each data attribute as represented below: Step 5: Consequently, estimate the distance = , ; VOLUME XX, 2017 9 Step 6: The data points are selected according to the maximum values of clusters =1 as shown in below: -Cluster center; Step 7: Finally, the remaining points of same cluster are grouped based on the higher density value of nearest neighbors. The preprocessed dataset is split into number of clusters as 1 , 2 … ; Step 8: Return, the subsets of clusters 1 , 2 … ;

C. Perpetual Pigeon Galvanized Optimization (PPGO)
After normalizing the dataset, the novel Perpetual Pigeon Galvanized Optimization (PPGO) technique is employed to select the best suitable fitness function features. In this stage, the optimal parameters are identified based on the likelihood function with respect to the weight value of particles. The best fitness value is computed at varying iterations based on the maximum likelihood value and weight value. Consequently, the optimal features include energy, probability based on density, likelihood, trust, energy, weight function, IP, traffic stamp, port and average total number of packets are selected and used for training and testing the classifier. The PPGO mechanism is a bio-inspired optimization technique developed based on the swarm intelligence behavior model. Moreover, the proposed PPGO technique has the ability to efficiently handle multi-objective and complex optimization problems with reduced number of iterations. Hence, it is more suitable for improving the accuracy and detection efficiency of NIDS. The significant benefits of the PPGO technique are as follows: reduced computational complexity, best optimal solution with a reduced number of iterations, increased convergence speed, and high efficiency. Here, the number of pigeons in the current iteration , and compass factor are considered as the inputs for this optimization, then it produced the best optimal solution as the output. At first, the random number of pigeons and number of iterations are initialized as shown in below: Where, is the number of pigeons, indicates the current iteration, and denotes the pigeons in the current iteration. Then, the pigeons are computed with respect to the minimum fitness value by using the following model: Where, is the fitness function, 1 , 2 and 3 are the weight values, FPR is the false positive rate, and TPR is the True Positive Rate. After that, the path and velocity of each pigeon are computed as shown in below: Where, indicates the velocity, denotes the compass factor, is the global solution, defines the present position of pigeon, and is the present velocity. Yet again, the pigeons are computed with respect to the estimated fitness function and the global best solution is updated. Correspondingly, the loop has been executed until reaching the number of iterations of the , where the order of subscript base arranges the pigeons= 2 . Moreover, the destination of pigeons are estimated by using the following equation: Where, center pigeon at current iteration, indicates the current position of all pigeons, then the updated position is indicated as follows: Finally, the global best solution is updated and returned as follows: = (10) VOLUME XX, 2017 9 Based on this solution, the optimal number of features are selected and used to train the classifier to accurately detect the intrusions.

Algorithm II -Perpetual Pigeon Galvanized Optimization (PPGO)
Input: Number of pigeons in the present iteration ( ), and compass factor ; Output: Best optimal solution ; Step 1: At first, randomly initialize the number of pigeons as 1 , 2 … ; Step 2: Initialize the number of iterations as 1 , 2 , where 1 > 2 ; Step 3: Evaluate the pigeons according to the fitness function by using equ (4) and (5); Step 4: While 1 ≥ 1 do Update the path and velocity of each pigeon by using equ (6) and (7) respectively; Step 5: Consequently, compute the pigeons 1 , 2 … according to its fitness values; Step 6: Then, update the best global solution of ; Step 7: End while; Step 8: While ( ≥ 1) do Step 9: Arrange the pigeons with respect to the fitness value; Step 10: = 2 Step 11: Compute the destination of pigeons by using equ (8) and (9); Step 12: Update the global best solution as shown in equ (10); Step 13: End while;

D. Likelihood Naïve Bayes (LNB) Classification
In this work, the LNB based machine learning classification model is employed to classify whether the data is average or attack. This technique is developed based on the conventional NB classification technique. But in the proposed mechanism, the optimal parameters are identified and incorporated with the classifier for improving the overall accuracy and efficiency of the intrusion detection and classification system. The main advantage of using this technique is that it can handle the large dimensional datasets with reduced computational and time complexity by splitting and working with the blocks of information. Figure 3 shows the attack analysis and anomaly detection process flow using the LNB based classification technique.

Algorithm III -Likelihood Naïve Bayes (LNB)
Classification Input: Selected attributes based on the optimal solution; Output: Classified label ; Step 1: At first, the set of attributes are initialized as follows: VOLUME XX, 2017 9 = 1 , 2 … //Where, indicates the probability function, ( ) denotes the probability of hypothesis, and ( ) defines the probability of training data s.
Step 4: Based on this probability function, the classified label is produced as whether normal or intrusion;

IV. PERFORMANCE ANALYSIS
This section presents the performance analysis of both existing and proposed mechanisms concerning varying evaluation metrics. To validate the results of this work, three different types of datasets have been utilized, including NSL-KDD, CICIDS-2017, and Bot-IoT. The different types of measures used in this analysis are sensitivity, specificity, accuracy, precision, recall, F1 score, error, False Negative Rate (FNR), delay and detection rate. Figures 4 (a) and (b) show the attacking information and features of the NSL-KDD dataset.

FIGURE 4 (a). Features of NSL-KDD dataset FIGURE 4 (b). Attack Information of NSL-KDD dataset
It includes the types of attacks such as DOS, U2R, Probe, and R2L are examined in data pre-processing. Data separation was performed depending on whether the connection value is average establishment or termination. Refining attack data depending on the connection received, initial attack data was made ready. Then, the CICIDS dataset [54]comprises different types of attacks such as DDoS, DoS, web attack, and brute force, including many samples. This dataset is also one of the recent and extensively used IDS datasets in many network application systems, where the training and testing samples are more essential. Consequently, the Bot-IoT dataset comprises around more than 72,000 records related to different types of attacks. Also, it is one of the new datasets compared to the other IDS datasets but is highly complicated toprocess. Table 1 and Table 2 depict the detected number of attacks and the original data set in the NSL-KDD dataset for both U2R and R2L, respectively. These results show that the proposed ADC-DBScan-LNB based IDS accurately detects the number of attacks from the given dataset with better training and testing models.   Figure 5 shows the best fitness plot of both existing BAT and proposed LNB techniques concerningthe varying number of iterations. The analysis shows that the proposed technique finds the optimal best fitness value with a reduced number of iterations compared to the conventional approach based on the likelihood function. Consequently, Figure 6 shows the transmission delay of both existing and proposed IDS concerningthe varying number of transmitted packets. Typically, the delay is calculated based on the number of packets that are successfully transmitted with reduced time consumption and without any loss of information. This analysis shows that the proposed ADC-DBScan-LNB technique outperforms the other technique with reduced delay of transmission. In the existing work [55], intrusion detection is performed with the help of radial basis function integrated with BAT algorithm. Figure 7 depicts the confusion matrix of the proposed scheme for the NSL-KDD dataset, where the actual/predicted number of classes are determined with the number of sequences. The confusion matrix is mainly constructed for determining the detection accuracy of intrusion detection and classification with respect to the differential ratio of the sum of values as shown in the diagonal matrix.
In the work [56], Ensemble Learning depending upon Wi-Fi Network Intrusion Detection System. The effectiveness of Different base learners was increased by using the prediction model with the help of ensemble learning. Wifi network intrusion detection comprises three sections in it namely 1)AWID data set 2)data preparation 3)Ensemble algorithms. Two versions used in AWID dataset are 1) attack-class 2) attack-specific. Flooding, impersonation, injection are the attack classes. Noisy, missing values in dataset are eliminated in dataset and made ready in data preparation by reducing number of features. Ensemble algorithms utilized in this proposed method are Bagging, Random forest, Extra-trees, XGBoost. Training, Prediction is the two phases in bagging. Improving bagging method by decision trees was called as Random Forest. Extra-trees described random forests characteristics. XGBoost performed gradient Boosting. Typically, the performance of IDS can be assessed by using the key measures of accuracy, precision, recall, sensitivity, specificity, and false alarm rate. By using above performance measures the values are calculated and graph has been plotted.

FIGURE 9. Performance analysis of proposed work
The Figure 9 shows the various performance measures such as Accuracy, Detection rate and the false rate. All the values are measure in terms of Percentages. The calculated accuracy value of the proposed work is 89.56%. The detection rate is evaluated as 93.89%. The proposed work is having minimum alse rate as 4.7%. If the minimum value of false rate implies the superior performance of the proposed work.  Figure 11 and 12 shows the false positive rate and detection rate of the existing [58] and proposed intrusion detection techniques. Here, the comparative analysis is taken for the NSL-KDD dataset. The obtained results show that the proposed PPGO-LNB technique increases detection rate and reduces FPR for all types of attacks in the NSL-KDD dataset because the proposed scheme detects intrusions based on the optimal selection of attributes from the clustered data, which helps to enhance the overall detection efficiency of the IDS.  Figure 13 compares the TPR of the existing [9] and proposed IDS techniques for the four different types of classes in the NSL-KDD dataset. This analysis shows that the proposed PPGO-LNB scheme provides an increased TPR compared to the other models by predicting the classified labels based on efficient clustering and classification processes.  Figure 14 (a) to (d) depicts the ROC analysis of both existing and proposed intrusion detection techniques with respect to different types of attacks such as DoS, Botnet, web attacks, and Brute-force. The analysis shows that the proposed ADC-DBScan-LNB technique outperforms the existing NN technique with improved performance outcomes for all types of attacks. The ROC of the classifier is mainly evaluated for estimating the TPR and FPR of the detection system. Also, the increased values of TPR indicates an improved performance of the system. Here, Figure 15 evaluates the AUC of both existing and proposed techniques, and the results show that the ADC-DBScan-LNB technique improves performance compared to the other methods.
Where, TP indicates the True Positives, TN defines the True Negatives, FP represents the False Positives, and FN indicates the False Negatives. From this analysis, it is proved that the proposed ADC-DBScan-LNB technique provides an improved performance results, when compared to the classification technique. Because in the proposed scheme, the training and testing of classifier is performed based on the optimal number of features obtained from the given datasets. Also, the classifier accurately predicts the classified label based on the likelihood function of optimization, which helps to improve the overall system performance.    Table 5 and Table 6 compares the training and testing consumption of both existing [59] and proposed techniques for NSL-KDD and CIC-IDS2017 datasets respectively. Based on this evaluation, it is analyzed that the proposed PPGO-LNB technique requires reduced time consumption for both training and testing the models, when compared to the other models. Because, the proposed scheme utilizes the selected number of attributes for training the models, which helps to reduce the time consumption with ensured accuracy. Table 7 (a) to (c) compares the training and testing accuracy, sensitivity, specificity, and F1-score of existing [60], and proposed intrusion detection methodologies by using the datasets of BoT-IoT, CIC-IDS 2017, and NSL-KDD datasets respectively. The obtained results show that the proposed PPGO technique outperforms the other techniques with improved performance values of these measures. In addition to that, Table 8 compares the accuracy of existing and proposed classification methodologies by using the CICIDS-2017 dataset. Various optimisation-based machine learning and deep learning techniques have been considered during this evaluation.
The obtained results show that the proposed PPGO-LNBtechnique provides improved performance outcomes compared to the other approaches. Because the clusteringbased optimization mechanism could help to reduce the error rate of classification, which supports obtaining an increased accuracy of prediction.

V. CONCLUSION
Intrusion detection model identifies unauthorized access, abuse of data. Security was enhanced in intrusion detection systems. The intrusion detection system identifies network traffic not noticed by the firewall. Security was enhanced in the intrusion detection systems by deep neural network. Blocking malicious attacks, maintaining normal performance was the advantages of intrusion detection system. The intrusion detection system examined information such as source, destination number, and application version number for attack identification. Audit data identified intruders, log files. The intrusion detection system determined unauthorised access in the absence of confidentiality, integrity, and authentication. Naïve bayes follows bayes therm. The proposed work rectifies all the existing issues such as High Computational cost, High false alarm rate and Issues in detection of DOS, SQL injection, Buffer over flow, Login attempt and Apache struts attacks. All the above issues are rectified by using the proposed PPGO-LNB mechanism. Density based clustering identifies nonlinear structure depending upon density. It describes the cluster's density reachability, connectivity. The proposed work performance is verified by using various performance measures such as accuracy, detection rate and false rate. Intrusion detection using supervised, unsupervised algorithms are described in this paper. These measures are compared with the existing classifiers and proven the proposed approach's outperformance. The graphs clearly show the overwhelmed performance of proposed work. There will be a future work to extent this concept to various huge datasets.
In future, this work can be extended by applying an innovative Explainable Artificial Intelligence (EAI) models for designing an IDS architecture. This technique can be used to detect intrusions based on the mobile traffic classification. The multi-modal deep learning technique will be used to improve the overall efficacy of IDS.