Skip to Main Content
Gene selection is a challenging task in microarray data mining because a typical microarray dataset has only a small number of records while having thousands of attributes. This kind of dataset creates a high likelihood of finding false predictions that are due to chance. Finding the most relevant genes is often the key phase in building an accurate classification model. Irrelevant and redundant attributes have negative impacts on the accuracy of classification algorithms. In this paper, we present a new method for gene selection utilizing techniques from economics. We modify the Lorenz curves and the Gini coefficients by taking into account the order of classes and the order of gene's discretized values and use them for selecting relevant genes. We believe that our method is the first one for attribute selection that considers the order of classes and the order of the attribute's discretized values. We implemented this new method and compare our method with SAM, one of the most popular gene selection methods. Experimental results with many different classification algorithms for the task of classifying lung adenocarcinomas from gene expression show that (a) Our new method is different with SAM in the sense that it finds very different sets of significant genes. (b) Our method selects genes for more accurate classification.