Skip to Main Content
The quality of the feature selection algorithm is one of the most important factors that affects the effectiveness of an intrusion detection system (IDS). Achieving reduction of the number of relevant traffic features without negative effect on classification accuracy is a goal that greatly improves the overall effectiveness of the IDS. Obtaining a good feature set automatically without involving expert knowledge is a complex task. In this paper, we propose an automatic feature selection procedure based on the filter method used in machine learning. In particular, we focus on Correlation Feature Selection (CFS). By transforming the CFS optimization problem into a polynomial mixed 0-1 fractional programming problem and by introducing additional variables in the problem transformed in such a way, we obtain a new mixed 0 - 1 linear programming problem with a number of constraints and variables that is linear in the number of full set features. The mixed 0-1 linear programming problem can then be solved by means of branch-and-bound algorithm. Our feature selection algorithm was compared experimentally with the best-first-CFS and the genetic-algorithm-CFS methods regarding the feature selection capabilities. The classification accuracy obtained after the feature selection by means of the C4.5 and the BayesNet machines over the KDD CUP'99 IDS benchmarking data set was also tested. Experiments show that our proposed method outperforms the best first and genetic algorithm search strategies by removing much more redundant features and still keeping the classification accuracies or even getting better performances.