Skip to Main Content
A new mutual information (MI)-based feature-selection method to solve the so-called large p and small n problem experienced in a microarray gene expression-based data is presented. First, a grid-based feature clustering algorithm is introduced to eliminate redundant features. A huge gene set is then greatly reduced in a very efficient way. As a result, the computational efficiency of the whole feature-selection process is substantially enhanced. Second, MI is directly estimated using quadratic MI together with Parzen window density estimators. This approach is able to deliver reliable results even when only a small pattern set is available. Also, a new MI-based criterion is proposed to avoid the highly redundant selection results in a systematic way. At last, attributed to the direct estimation of MI, the appropriate selected feature subsets can be reasonably determined.