Skip to Main Content
The 3 most important issues for anomaly detection based intrusion detection systems by using data mining methods are: feature selection, data value normalization, and the choice of data mining algorithms. In this paper, we study primarily the feature selection of network traffic and its impact on the detection rates. We use KDD CUP 1999 dataset as the sample for the study. We group the features of the dataset into 4 groups: Group I contains the basic network traffic features; Group II is actually not network traffic related, but the features collected from hosts; Group III and IV are temporally aggregated features. In this paper, we demonstrate the different detection rates of choosing the different combinations of these groups. We also demonstrate the effectiveness and the ineffectiveness in finding anomalies by looking at the network data alone. In addition, we also briefly investigate the effectiveness of data normalization. To validate our findings, we conducted the same experiments with 3 different clustering algorithms - K-means clustering, fuzzy C means clustering (FCM), and fuzzy entropy clustering (FE).