Choice of dimensionality reduction methods for feature and classifier fusion with nearest neighbor classifiers | IEEE Conference Publication | IEEE Xplore

Choice of dimensionality reduction methods for feature and classifier fusion with nearest neighbor classifiers


Abstract:

Often high dimensional data cause problems for currently used learning algorithms in terms of efficiency and effectiveness. One solution for this problem is to apply dime...Show More

Abstract:

Often high dimensional data cause problems for currently used learning algorithms in terms of efficiency and effectiveness. One solution for this problem is to apply dimensionality reduction by which the original feature set could be reduced to a small number of features while gaining improved accuracy and/or efficiency of the learning algorithm. We have investigated multiple dimensionality reduction methods for nearest neighbor classification in high dimensions. In previous studies, we have demonstrated that fusion of different outputs of dimensionality reduction methods, either by combining classifiers built on reduced features, or by combining reduced features and then applying the classifier, may yield higher accuracies than when using individual reduction methods. However, none of the previous studies have investigated what dimensionality reduction methods to choose for fusion, when outputs of multiple dimensionality reduction methods are available. Therefore, we have empirically investigated different combinations of the output of four dimensionality reduction methods on 18 medicinal chemistry datasets. The empirical investigation demonstrates that fusion of nearest neighbor classifiers obtained from multiple reduction methods in all cases outperforms the use of individual dimensionality reduction methods, while fusion of different feature subsets is quite sensitive to the choice of dimensionality reduction methods.
Date of Conference: 09-12 July 2012
Date Added to IEEE Xplore: 30 August 2012
ISBN Information:
Conference Location: Singapore

I. Introduction

The nearest neighbor algorithm [1] is a simple learning algorithm which could be applied to high dimensional datasets without any modification to the original algorithm. However, its performance is often poor in high dimensions, as is the case also for many other learning algorithms, due to its sensitivity to the input data [2]. This is known as the curse of dimensionality [3], a well known phenomenon that misleads learning algorithms when applied to high-dimensional data. Dimensionality reduction is one potential approach to address this problem [3], [4]. However, selecting a suitable dimensionality reduction method for a dataset may not be straightforward since the resulting performance is dependent not only on the dataset but also the learning algorithm.

Contact IEEE to Subscribe

References

References is not available for this document.