Skip to Main Content
Data mining is the use of algorithms to extract the information and patterns derived by the knowledge discovery in databases process. Classification maps data into predefined groups or classes. It is often referred to as supervised learning because the classes are determined before examining the data. In many data mining applications that address classification problems, feature and model selection are considered as key tasks. That is, appropriate input features of the classifier must be selected from a given set of possible features and structure parameters of the classifier must be adapted with respect to these features and a given data set. This paper describes feature selection and model selection simultaneously for k-nearest neighbor (k-NN) classifiers. In order to reduce the optimization effort, various techniques are integrated that accelerate and improve the classifier significantly: hybrid k-NN, comparative cross validation. The feasibility and the benefits of the proposed approach are demonstrated by means of data mining problem: intrusion detection in computer networks. It is shown that, compared to earlier k-NN technique, the run time is reduced by up to 0.01 % and 0.06 % while error rates are lowered by up to 0.002 % and 0.03 % for normal and abnormal behaviour respectively. The algorithm is independent of specific applications so that many ideas and solutions can be transferred to other classifier paradigms.
Date of Conference: 13-15 Dec. 2009