In This work we describe an algorithm for feature selection and gene clustering from high dimensional gene expression data. The method is based on measuring similarity between features/genes whereby redundancy therein is removed. This does not need any search and therefore is fast. A novel feature similarity measure, called maximum information compression index, is used. The feature selection algorithm also obtains gene clusters in a multiscale fashion. The superiority of the algorithm, in terms of speed and performance, is established on a real life molecular cancer classification dataset.
Published in:
Pattern Recognition, 2004. ICPR 2004. Proceedings of the 17th International Conference on
(Volume:2
)
Date of Conference: 23-26 Aug. 2004