Skip to Main Content
Network Intrusion Detection Systems (NIDS) monitor internet traffic to detect malicious activities. Unfortunately, the amount of data that must be analyzed by NIDS is too large. Several feature selection and feature extraction techniques have been proposed to reduce the size of data. Few are focused on finding exactly by how much the dataset should be reduced. The purpose of this paper is to contribute to the finding of that finite amount of data required for successful intrusion detection. A new hybrid algorithm MID-PCA combining PCA (Principal Component Analysis) and mRMR (minimum Redundancy Maximum Relevance - MID evaluation criteria) is proposed. PCA is first applied to the original dataset. Then, mRMR-MID is applied to the intermediary output to further reduce redundancy and maximize relevancy. An exhaustive evaluation of the MID-PCA algorithm is conducted with the KDD Cup'99, a used widely dataset in the network security community. MID-PCA performance was compared to that of PCA and mRMR using two classifiers namely J48 (C4.5) and BayesNet. Experimental results assert the effectiveness of the newly proposed algorithm MID-PCA for NIDS feature extraction compared with PCA and Mutual Information. The newly proposed MID-PCA shows better performance and classification accuracies with reduced datasets of only 4 dimensions for BayesNet (99.77%) and 6 dimensions for J48 (99.94%). This is an improvement over PCA which achieves similar classification accuracy with 12 principal components (twelve dimensions). An extension of this paper will conduct broader experiments using other datasets, then compare results to that of several well known feature reduction algorithms to confirm the superiority of MID-PCA.