Skip to Main Content
In microarray classification we are faced with a very large number of features and very few training samples. This is a challenge for classical Linear Discriminant Analysis (LDA), since reliable estimates of the covariance matrix cannot be obtained. Alternative techniques based on Diagonal LDA (DLDA) combined with an independent gene selection (filtering) have been proposed. In this paper we propose a novel sequential DLDA (SeqDLDA) technique that combines gene selection and classification. At each iteration, one gene is sequentially added and the linear discriminant (LD) recomputed using the DLDA model (i.e., a diagonal co-variance matrix). Classical DLDA will add the gene with highest t-test score without checking the resulting model. In contrast, SeqDLDA will find the one gene that better improves class separation after recomputing the model measured using a robustified t-test score. We evaluate the new method in several 2-class datasets (Neuroblastoma, Prostate, Leukemia, Colon) using 10-fold cross-validation. For example, for the Neuroblastoma data set, the average misclassification rate of DLDA (16.91%) is significantly reduced to 13.87% using SeqDLDA.