Skip to Main Content
Machine Learning algorithms have been widely used for gene expression data classification, despite the fact that these data have often intrinsic limitations, such as high dimensionality and a small number of examples. Few studies try to characterize to which extent these aspects can influence the performance of the classification models induced. In this paper we compute different measures characterizing the complexity of gene expression data sets for cancer diagnosis. We then investigate how these measures relate to the classification performances achieved by support vector machines, a popular Machine Learning technique usually employed in the analysis of gene expression data. The results obtained indicate that some of the complexity indices utilized are indeed successful in explaining the difficulty involved in the classification of cancer gene expression data.