Skip to Main Content
This paper presents a neural-network-based active learning procedure for computer network intrusion detection. Applying data mining and machine learning techniques to network intrusion detection often faces the problem of very large training dataset size. For example, the training dataset commonly used for the DARPA KDD-1999 offline intrusion detection project contained approximately five hundred thousand (10% sample of the original five million) observations, which were used to build intrusion detection classification models. The practical problems associated with such a large dataset include very long model training times, redundant information, and increased complexity in understanding the domain-specific data. We demonstrate that a simple active learning procedure can dramatically reduce the size of the training data, without significantly sacrificing the classification accuracy of the intrusion detection model. A case study of the DARPA KDD-1999 intrusion detection project is used in our work. The network traffic instances are classified into one of two categories - normal and attack. A comparison of the actively trained neural network model with a C4.5 decision tree indicated that the actively learned model had better generalization accuracy. In addition, the training data classification performance of the actively learned model was comparable to that of the C4.5 decision tree.