Skip to Main Content
More and more instruments for sophisticated biological measurements at micro-level are now available, leading to increasing quantities of gene expression data being collected. mRNA or cDNA expression levels for several thousands of genes are measured, but for practical reasons the number of data points (samples) are only in dozens. Some of the data are temporal, the change of gene-expression over a period of time. Other data sets are snap-shots at an instant of time. In this work, we consider such a non-temporal gene expression data to identify the few genes (out of thousands) which are sufficient for a target classification of the presence/absence of some disease. A general framework is proposed where, after eliminating genes having poor correlation with target classes, data dimension is further reduced with a two stage supervised and unsupervised artificial neural network classifier. The supervised classifier is a multilayer perceptron (MLP), whose input to hidden unit weight vectors are used as input to a clustering algorithm, here a self-organizing map (SOM). From the cluster centers we identify the responsible genes.