An Agile Approach to Identify Single and Hybrid Normalization for Enhancing Machine Learning-Based Network Intrusion Detection

Detecting intrusion in network traffic has remained a problematic task for years. Progress in the field of machine learning is paving the way for enhancing intrusion detection systems. Due to this progress intrusion detection has become an integral part of network security. Intrusion detection has achieved high detection accuracy with the help of supervised machine learning methods. A key factor in enhancing the performance of supervised classifiers is how data is augmented for training the classification model. Data in real-world networks or publicly available datasets are not always normally (Gaussian) distributed. Instead, the distributions of variables are more likely to be skewed. To achieve a high detection rate, data normalization or transformation plays an important role for machine learning-based intrusion detection systems. Several methods are available to normalize the attributes of the data before training a classification model. However, opting for the most suitable normalization technique is still a questionable task. In this paper, a statistical method is proposed that can identify the most suitable normalization method for the dataset. The normalization method identified by the proposed approach gives the highest accuracy for an intrusion detection system. To highlight the efficiency of the proposed method, five different datasets were used with two different feature selection methods. The datasets belong to both Internet of things and traditional network environments. The proposed method is also able to identify hybrid normalizations to achieve even improved intrusion detection results.


I. INTRODUCTION
Studies on machine learning (ML) and deep neural networks (DNN) for intrusion detection systems (IDS) have become prominent due to an increase in knowledge on neural networks [1]. IDS play a significant role in securing a network since they aim to identify and highlight elements that can disrupt network communication. With the efficiency of MLbased IDS, the applications of IDS are no longer limited to traditional networks. The Internet of Things (IoT), which represents a large portion of today's world of interconnected devices represents a unique challenge for security require-The associate editor coordinating the review of this manuscript and approving it for publication was Zheng Yan . ments. Due to the limitations of resources in IoT and low-cost production, IoT devices are being targeted by a high number of attacks [2]. However, several researchers have proposed effective methods for IoT security [3]- [5]. A key element of training any ML-based IDS is the pre-processing of training data [6]. Various factors can influence the training model of an ML-based IDS. This is why it is vital to provide the training model with data that is normalized and contains relevant features [7]. Hence, researchers have pursued exploring pre-processing methods, feature selection methods, and data normalization methods to achieve a high anomaly detection rate [8], [9]. Normalization or transformation plays a vital role in network security, as normalization forces integrity which tends to increase the general cleanliness and structure for each feature [10]. Data normalization leads to improved representation of data and allows ML-based algorithms to get most of the data, resulting in enhanced predictions. Various publicly available datasets for IDS contain numerous features (variables) whose distribution is outlying from normal (Gaussian) and asymmetric (skewed) [11]. Such factors make things complicated for achieving high detection in an IDS. Normalization helps in improving the interpretation of data, getting insights about the bond between variables in a feature, and meeting norms for statistical inference [12]. However, selecting a normalization or sequence of normalization is quite challenging [13], [14]. The absence of any standard method for evaluating the effects of normalization for the dataset classification results in a selection based on the hit and trial method [15], [16]. Such an approach can be a time-consuming process with no guarantee that the selected normalization is the most suitable for the ML model and dataset. In this paper, a statistical method to identify a suitable normalization method is suggested. The proposed method can be used to identify the most suitable normalization, transformation, or scaling method to achieve a high detection rate in ML-based IDS. The proposed method identifies not only single but also hybrid normalization methods for the dataset.
The key contributions of the paper are: • Identifying a statistical matrix that can assist in finding the most suitable normalization for the data at hand.
• Based on the computed ranks one can identify the most effective single or hybrid normalization for data in hand.
• To prove the validity and generality of the proposed statistical method, five different datasets with both numerical and non-numerical feature attributes were selected. Then, two different feature selection methods were employed for feature selection and three different ML classifiers were implemented to verify the selected normalization method.
The rest of the paper is structured as follows. Section II covers the related work on methods that are used for identifying the normality of the dataset with some prominent data normalization methods. Section III describes the details of the proposed process to identify data normalization. Section IV briefs about the experiments for the proposed process. Section V covers the experiments conducted to evaluate and validate the proposed technique. Section VI represents the results of the evaluation and validation process of the proposed method. Section VII set forth the discussion on the proposed model and a comparison between the proposed model and similar approaches. In the end, section VIII concludes the paper.

II. RELATED WORK
With the rapid expansion of the internet and interconnected devices, network security has come to be increasingly challenging. Network intrusion detection (NIDS) has proven to be an effective method to achieve high accuracy in classifying network anomalies. Most of the supervised classification methods rely on prior normalized datasets to train the model for classification. However, real-world network data do not contain any normalization pre-process. Suitable normalization for publicly available IDS datasets can easily be established based on available research work. On the other hand, identifying suitable normalization methods for real-world or new datasets remains a concern. Generally, ML methods tend to perform well on a dataset with normal distribution [17], [18]. Distance from normality in a dataset can be illustrated in several different methods; however, the most prominent measures are skewness and kurtosis [19]. The skewness defines the asymmetry of a distribution in a dataset and zero skewness indicates symmetric distribution. Asymmetric distribution with a larger tail on the right has positive skewness and a dataset with a larger left tail has negative skewness. On the other hand, kurtosis deals with both tail heaviness and peakedness of a distribution associated with that of the normal distribution. Therefore, kurtosis is restricted to symmetric distributions [20]. Generally, if the values of skewness and kurtosis significantly diverge from zero and three respectively, it is expected that the dataset in hand may not be normally distributed. However, no official guidelines are specified for the values of kurtosis or skewness to indicate the non-normality of a dataset [21]. Other common methods to check normality before classification of the dataset are histogram, Box plot, QQ (quantile-quantile) plot, Kolmogorov Smirnov test, Lilliefors test, and Shapiro Wilk test [22], [23]. The mentioned methods suffer from diverse limitations. Histogram can be deceptive since changing the graph scale can alter the shape of the distribution and may lead to misperception [24]. Box-plot generates limited information to understand or conclude normality [24]. Similarly, QQ-plot can be a little tricky in identifying the right normalization method as shown in Figure 1. As seen in Figure 1, all three representations of the QQ plot are quite similar yet the classification results are different.
Among the tests Kolmogorov Smirnov, Lilliefors, and Shapiro Wilk, the Shapiro test is the most powerful [23]. The Shapiro-Wilk assessment is based on a random sample from the dataset. The null hypothesis [25] of the Shapiro-Wilk test is that the data is normally distributed. If the p-value [26] of the sample data is lower than 0.05 then the distribution is not normal. However, based on sample size it is possible that the p-value can identify a normally distributed dataset as not normally distributed and vice versa [22]. As a result, the method to identify a fitting normalization method for a dataset before classification is unclear. Transformation and normalization techniques implemented in this paper include VOLUME 9, 2021 L2-normalization, Yeo-Johnson, Min-Max, Robust scaler, and Standard scaler. The Yeo-Johnson transformation [27] is an extension of the Box-Cox transformation. Mathematically Yeo-Johnson can be represented as Equation 1, where 'y' are the values, 'λ' can be any real number, and λ = 1 gives the identity transformation [11].
The Min-Max normalization [28] is among the most commonly used normalization [29]- [31]. Min-Max implements linear transformation on the data. Mathematically Min-Max can be represented as Equation 2.
The Robust scaler [32] approach to normalizing data is similar to min-max. The only difference is that the Robust scaler scales data based on the quintile range. Equation 3 represents the Robust scaler, where 'x' represent the values while Q 1 = 25 th quantile and Q 3 = 75 th quantile.
The Standard scaler [33] normalizes the data by removing the mean and scaling the data to unit variance. Mathematically Standard scaler can be represented as Equation 4, where 's' is the standard deviation and 'µ' is the mean.
The L2-standardization [34] normalizes the dataset in a way that in each row the sum of the square of each value will be one. Equation 5 represent L2-standardization, where 'x' represent the values of features in the dataset.
Data pre-processing plays a vital role in an IDS and for several data mining-related operations. Data normalization is an essential part of data pre-processing, particularly for intrusion detection methods that rely on statistical attributes extracted from the data at hand. As in paper [35], the authors highlighted the importance of data normalization and how it can affect the performance of anomaly detection. The authors implemented four different normalization methods with three different classifiers. This paper was aimed to answer two questions. First, whether attribute normalization is crucial for intrusion detection performance. Second, which technique of attribute normalization is most effective. The authors concluded the paper by providing experimental proof that attribute normalization plays a role in improving anomaly detection. Among the implemented normalization methods, statistical-based normalization resulted in achieving the highest accuracy results. The only downside of this paper was that the authors identified the most suitable normalization based on classification results. This approach of identifying normalization based on classification is not a proficient process. A study by B. Setiawan et al. [36], used the information gain method to select the most suitable normalization method. In this research, the authors used information gain on the log normalization, min-max, and z-score normalization schemes. After implementing normalization, attributes were rounded off by 2 to 10 decimals, and Information gain was used on each decimal alteration. Based on the information gain method the quality of attributes was computed. As per results, the highest risk of rounding the normalization was displayed by log normalization and z-score. Yet, the authors implemented log normalization for the intrusion detection system. The authors justified the use of log normalization by stating that it had the three decimal place-safe threshold. Despite the justification, implementing log normalization is a concern as the information gain implemented by authors highlighted that rounding log normalization is not suitable. In a paper by Yu Liping et al. [37], they evaluated several normalization methods and concluded that each evaluation purpose requires a different data normalization procedure. Compared to the mentioned papers, this paper presents a more precise statistical approach to identify the most suitable normalization method for the dataset in hand.

III. PROPOSED METHOD
In this article, a statistical method is proposed to identify the most suitable normalization, transformation, or scaling method for the data at hand. The proposed statistical model is not limited to a specific format of a dataset. As the model is validated on datasets that cover both numeric and non-numeric feature attributes. The proposed model is also implanted on IoT-based datasets to further validate the general application of the suggested approach. The flow of the proposed method can be seen in Figure 2.
Initially, a dataset is pre-processed by applying basic data cleaning. Details of data pre-processing are included in the experiment section of the paper. After data cleaning, feature selection is applied to the dataset. After feature selection, the normalization, transformation, or scaling methods are applied to the dataset. To find the most suitable normalization approach, two or three most common normalization methods can be applied separately on the dataset. This will result in multiple datasets based on the number of selected normalization methods. The following two steps will be applied to each dataset separately. First, the mean, median, and skewness of each feature of the datasets are computed. Second, the overall average of mean, median, and skewness of the features in the datasets are computed. This will result in a matrix of the average mean, median, and skewness of the features from each dataset. The skewness in the proposed model is taken as an absolute value. The reason is later discussed in the sub-section data normalization and encoding. After computing the average mean, median, and skewness of the features of all the datasets the Ranking and Percentile method are applied on the computed matrix. The Rank and Percentile methods assign ranks based on descending order i.e. the highest value in the column will be ranked first. To identify the most suitable normalization method, the sum of the ranks is calculated. As a result, the normalization with the largest sum of ranks is the most suitable normalization for the data at hand. Based on the flow diagram in Figure 2, the proposed statistical model is shown in Algorithm 1.

IV. EXPERIMENTATION
In this paper, we analytically evaluate the effect of different methods of attribute normalization on the performance of ML-based IDS. Three ML algorithms, Support Vector Machine (SVM), Random Forest (RF), and Deep Neural Network (DNN) were employed to validate the proposed statistical method. The reason for implementing three different classifiers is to highlight that the proposed model is not dependent on a particular ML algorithm. As each of the mentioned ML algorithms belongs to a different category of ML classification method. The SVM is associated with spatial regression-based algorithms, RF represents a decision-based classifier, and DNN belongs to supervised deep learning ML algorithms. Attribute normalization schemed used were Yeo-Johnson, Min-Max, Robust scaler, Standard scaler, and L2 normalization. The stated normalization methods are selected because they cover the core techniques for data normalization methods i.e. scaling, clipping, and log-based scaling. The datasets used for the experiment were CIC-IDS 2017 [38], ISCX-IDS 2012 [39], NSL-KDD [40], UNSW-NB 15 [41], and Bot-IoT [42]. All the mentioned datasets are well-known and specifically developed for testing network security-based algorithms. Further, the Bot-IoT dataset was designed particularly for IoT-based security algorithms. Datasets CIC-IDS 2017, ISCX-IDS 2012, and Bot-IoT were created by modeling a real network environment containing both normal and attack traffic. The NSL-KDD dataset is an improved version of the KDD'99 dataset [40]. The KDD'99 dataset suffered from a high number of redundant records, duplicate values, and biased sampling. The Per-fectStorm tool created the UNSW-NB 15 dataset. The tool generated a mixture of both normal and attack traffic behaviors to create the UNSW-NB 15 dataset. The details of the attacks simulated in each dataset are shown in Table 1. The motive behind implementing multiple datasets, feature selection methods, and classifiers is to highlight the generality and flexibility of the proposed method. The paper's contribution is twofold. First, the proposed statistical method is implemented to identify the most suitable normalization method for each dataset. Secondly, multiple ML-based IDS are implemented to validate the results of the proposed method. The hardware used for the experiments was an Intel Xeon Gold 32 core (64 threads) processor with 192GB RAM and RTX2080ti GPU. The programming language used for the x (x 1 , x 2 , . . . , x k ), k(∈ N): datasets. 5: ith data: . . , f i n ), n is the total number of features. 6: N m = m-th normalization. 7: Step 1: Apply Pre-Processing on the dataset. 8: x ← Pre − Processing(x) 9: Step 2: Apply feature selection on dataset. 10: x ← FeatureSelection(x ) 11: Step 3: Apply normalization, transformation or scaling on dataset, N m .
14: 17: Step 5: Compute average Mean, Median and Skewness of the dataset. 18: At the end of pre-processing, features with zero variance were dropped for the datasets using zero-variance [44]. After preprocessing, details of the dataset can be seen in Table 1.
To avoid any biasing, the synthetic minority over-sampling technique (SMOTE) is implemented with edited nearest neighbors (ENN) [8] to perform cleaning on the training set of each dataset.   In this experiment, Pearson correlation was applied using the python library with correlation coefficients 0.95 and −0.95. . The wrapper-based FS-DT [47] is a greedy search algorithm that tries to find the ''optimum'' subset of features by iteratively selecting features based on the classifier performance. Table 2 shows the number of features selected after the feature selection process.

C. DATA NORMALIZATION AND ENCODING
In this experiment, five data normalization methods i.e. Yeo-Johnson, Robust scaler, Min-Max, Standard scaler, and L2 normalization are implemented. The NSL-KDD and UNSW NB-15 datasets contained both numeric and nominal features. Therefore, the one-hot encoding and labelencoding [48] are applied to NSL-KDD and UNSW NB-15 features respectively. Both one-hot encoding and labelencoding were applied using the python libraries. The one-hot encoding works by creating new binary columns to replace the categorical feature in a dataset. For instance, in the NSL-KDD dataset, the categorical feature 'protocol_type' had three attributes tcp, udp, and icmp. So the one-hot encoding encoded the attributes tcp as 001, udp as 010, and icmp as 100 and aggregating one feature column to three feature columns. Due to the one-hot encoding, NSL-KDD features were increased from 19 to 98 On the other hand, the labelencoding on the UNSW NB-15 dataset assigned a unique numeric value to each attribute of the non-numeric feature. For illustration, the feature 'proto' in UNSW NB-15 had nonnumeric attributes i.e tcp, udp, igmp, ospf, sctp, etc. the label encoding simply encoded tcp as 0, udp as 1, igmp as 2, and so on. The reason for applying different encoding techniques on NSL-KDD and UNSW NB-15 datasets is due to the difference in the ordinal nature of the categorical attributes. Later, each normalization is applied to all five datasets. Software ''Minitab'' was then used on the normalized datasets to extract attribute mean, median, and skewness. The average and middle values of the attribute are represented by mean and median respectively. While the skewness as defined earlier is the asymmetry of a distribution in a dataset and zero skewness indicates symmetric distribution. Asymmetric distribution with a larger tail on the right has positive skewness and a dataset with a larger left tail has negative skewness [21]. Equations 7, 8, and 9 were used to compute the mean, median, and skewness respectively.
where ''N'' represents the number of values in the dataset, ''x'' represents the value in a dataset. Further in Equation 9, ''x'' represents the mean, and ''s'' represents the standard deviation of the dataset. For the proposed statistical method, skewness is taken as an absolute value. Negative skew is generally considered problematic for statistical models [49], [50]. Table 3 represents the average mean, median, and skewness of datasets after each normalization. In this research, hybrid-normalization methods are also implemented for two reasons. One, to check whether a combination of two normalization methods can further improve IDS performance. Second, to check the flexibility of the proposed model. Based on the five normalization techniques selected for experimentation, a high number of combinations for hybrid normalization are possible. However, only a handful of combinations with high performance were selected as shown in Table 4.

D. IDENTIFYING THE BEST NORMALIZATION
After computing the matrices shown in Tables 3 and 4 percentile ranking is applied to identify the most suitable normalization method. Percentile rank [51] returns a score compares to other scores in the same matrix or set. This method can be used to calculate the relative standing of a value within a matrix or set. In this experiment, the Rank and Percentile data analysis tool from MS-Excel is used. The formula for percentile and ranking can be represented as Equations 10 and 11.

P =
x N × 100 (10) r = P 100(n + 1) (11) where ''P'' is the percentile, ''x'' represents the number of values below the selected value, ''N'' represents the total number of values, ''r'' represents the rank, and ''n'' is the number of values. Ranks are assigned based on descending order. After applying the Rank and Percentile method in Table 3, Table 5 was acquired. The normalization method with the highest sum of rank in Table 5 represents the most suitable normalization for the dataset. Similarly, the Rank and Percentile method was applied on Table 4 to compute Table 6. The hybrid normalization method with the highest sum of rank in Table 6 represents the most suitable hybrid normalization for the dataset.
Based on the proposed method, normalization methods that achieved the maximum sum of rank in Tables 5 and 6 will achieve the highest accuracy for the respective dataset with ML-based IDS.

V. EVALUATING PROPOSED STATISTICAL MODEL
To evaluate and verify the proposed statistical model, three ML-based IDS are implemented. The implemented IDS models are based on RF, SVM, and DNN. The DNN model for IDS in this paper is the same as our earlier published work [6].  The DNN used in earlier work had four dense layers with 120 nodes in an individual layer; other parameter specifics can be seen in Table 7.
Apart from classification, the Cohen's kappa coefficient [52], and receiver operating characteristics (ROC) [53] were also computed to verify that the normalization selected by the proposed method is the most suitable for the dataset. For the ML classification model accuracy, precision, recall, and the F1-score were calculated using Equations (12)- (15).

Accuracy =
TruePositive + TrueNegative Total (12) VOLUME 9, 2021 where true positive is when an attack is correctly identified as an attack and false positive is when normal traffic is incorrectly identified as an attack. True negative is when normal traffic is correctly identified as normal traffic and false negative is when an attack is incorrectly identified as normal traffic. The kappa coefficient score is a very handy measure of an ML model's capability when performing multi-class classification [54]. The kappa coefficient compares the predicted and expected accuracy of an ML algorithm. Mathematically kappa coefficient can be represented as Equation 16.
where 'p 0 ' is the overall accuracy of the ML model and 'p e ' represents the agreement between the ML model estimates and the authentic class values as if happening by chance. On the other hand, the receiver operating characteristic (ROC) curve is a graphical representation of the classification model at all classification thresholds [55].

VI. RESULTS
In this section results of the ML classifiers on CIC-IDS 2017, ISCX-IDS 2012, NSL-KDD, UNSW NB-15, and Bot-IoT datasets are presented. The reason behind using five different datasets and multiple ML classifiers is to highlight the flexibility and generality of the proposed method. Table 8 represents the most suitable normalization method based on the proposed method ranking computation as shown in Table 5.

A. RANDOM FOREST BASED IDS MODEL
As a part of verifying the proposed statistical model, RF was implemented for classification on all five datasets as shown in Table 9. The normalization methods highlighted by the proposed method (i.e. Table 8) achieved the highest accuracy. For the single normalization method, Yeo-Johnson achieved the highest accuracy in all five datasets. While for the hybrid normalization method, the combination of Yeo-Johnson and Min-Max achieved the highest accuracy for CIC-IDS 2017 and ISCX-IDS 2012 datasets. While the combination of Yeo-Johnson + Standard was able to achieve the highest accuracy on NSL-KDD. Based on Equation 16, the Kappa coefficient score was computed for all five datasets. Table 10 represents the Kappa score of the normalization method which achieved the highest accuracy based on the RF-based IDS model. To visualize the classification performance of RF-based IDS, Figure 3 represents the classification matrix of each dataset with both single and hybrid transformations. Although achieving high accuracy is not the core purpose of this research but, the RF-based IDS was able to achieve good classification results on each dataset excluding the UNSW NB-15 dataset. Figures 3 (a)

B. SUPPORT VECTOR MACHINE BASED IDS MODEL
In this research, the SVM-based IDS model is executed for 1000 epochs. As the main goal of performing classification is to verify the proposed statistical model and not to achieve high accuracy results. Based on Table 11, Yeo-Johnson achieved the highest accuracy as highlighted by the proposed method. However, in hybrid normalization for CIC-IDS 2017 and BoT-IoT datasets the highlighted normalization method (i.e. Table 8) did not achieved the best classification results. For datasets, ISCX-IDS 2012, NSL-KDD, and UNSW NB-15 the highest accuracy was achieved by the normalization method identified by the proposed statistical method as shown in Table 8.
Based on Equation 16, the Kappa coefficient score is computed for all five datasets. Table 12 represents the Kappa score of the normalization method which achieved the highest accuracy based on the SVM-based IDS.
To visualize the classification performance of SVM-based IDS, Figure 5 represents the classification matrix of each   dataset with both single and hybrid transformations.     ization. Figure 5  The DNN-based IDS model implemented in this paper is based on our earlier work [8]. Table 13 represents the classification results achieved by the DNN-based IDS model. The DNN-based IDS model validated the Yeo-Johnson normalization as the most suitable normalization method for all five datasets as predicted by the proposed model (i.e. Table 8).
On the other hand, in hybrid normalization for CIC-IDS 2017 dataset, L2 normalization + Yeo-Johnson achieved the highest accuracy rather than Yeo-Johnson + Min-Max. For datasets, ISCX-IDS 2012, NSL-KDD, UNSW NB-15, and Bot-IoT the highest accuracy was achieved by the normalization method identified by the proposed statistical method as highlighted in Table 8.
Based on Equation 16, the Kappa coefficient score was computed for all five datasets. Table 14 represents the Kappa score of the normalization method which achieved the highest accuracy based on the DNN-based IDS.
To visualize the classification performance of DNN-based IDS, Figure 7 represents the classification matrix of each dataset with both single and hybrid transformations. Even though achieving high accuracy is not the main purpose of this research but, the DNN-based IDS was able to perform with good accuracy on each dataset excluding the UNSW     NB-15 dataset. Figure 7  The ROC curves for each dataset classified by DNN-based IDS are presented in Figure 8. Each colored line in the ROC graph represents a class in a dataset. Figure 8

VII. DISCUSSION
Conventionally the Min-Max and standardization are considered as the most common normalization methods [56]. Even though the general application of the mentioned method can be questioned. In comparison to the existing methods highlighted in related work and Table 15, the study conducted in this paper is much more comprehensive and generally applicable for improving an ML-based IDS. To corroborate the generalization of the proposed model, five different datasets with two different feature selection methods were used. The datasets contained both numeric and nominal features. After applying the proposed statistical method, the most suitable normalization method for the datasets was highlighted in the form of ranks with the help of the Rank and Percentile method. To validate the proposed model, three different ML-based IDS were implemented. Based on the validation procedure, the normalization methods identified by the proposed statistical model achieved higher accuracy as compared to the other normalization methods. On the other hand, not all hybrid normalizations were identified successfully. The proposed model successfully identified eighteen hybrid normalizations out of the twenty hybrid normalizations. However, it is possible to further improve hybrid normalization detection by testing a few more potential combinations of normalizations. Some of the hybrid normalizations were even able to achieve improved classification results as compared to the single normalization methods. The reason behind implementing the hybrid normalization method was to check the ability of the proposed model to identify non-standard normalization methods. Therefore, researchers can also compare newly proposed normalization methods with existing standardized methods. As the proposed method deals with a very specific aspect of the ML pre-processing chain, it can also be used to improve domains other than security. Such areas may include network traffic classification using ML [57], ML for low power devices [2], etc. as the proposed model is computationally efficient. The computational complexity VOLUME 9, 2021 of our proposed algorithm can be presented as follow: (k · n · m log(m)) (17) where k, n, m are dataset size, feature number, and the number of attributes normalized. Noting that m < n < k, we can say that the complexity is defined by 'k' only. Which means that the proposed algorithm is very efficient in term of computation complexity as it is 1st order of the polynomial, i.e. (k). In contrast, other normality testing algorithms such as Q-Q plot, D'Agostino's K-squared test, Jarque-Bera test, etc, cannot have a computational complexity lower than (k). Table 15 represents a comprehensive comparison between the existing methods to identify suitable normalization methods and the proposed algorithm.

VIII. CONCLUSION
The rising rate of complex attacks on networks has truly tested the limitations of network security. The ML-based IDS can play an integral part in providing enhanced security measures. Normalization, which is a part of pre-processing, plays an important part in improving ML-Based IDS. While identifying which normalization method is suitable for the data at hand is quite challenging. In this study, a statistical model to identify the most suitable normalization method to enhance the performance of ML-based IDS is proposed. The proposed model is agile and does not require high computation. The proposed model used a matrix of mean, median, and skewness with the Rank and Percentile method to identify the most suitable normalization method for the selected dataset.
To validate the proposed statistical model three classifiers are also implemented. Based on the validation results the proposed model was able to identify the most suitable normalization method with high accuracy. Such statistical methods open opportunities for researchers to further improve existing methods to assist pre-processing methods to improve MLbased IDS. For future research, we are looking to further improve the algorithm's ability to identifying hybrid normalizations and to identify the most suitable combination of normalization methods that can improve the performance of an ML-based IDS.