Skip to Main Content
When building predictors of disease state based on gene expression data, gene selection is performed in order to achieve a good performance and to identify a relevant subset of genes. Although several gene selection algorithms have been proposed, a fair comparison of the available results is very problematic. This mainly stems from two factors. First, the results are often biased, since the test set is in one way or another involved in training the predictor, resulting in optimistically biased performance estimates. Second, the published results are often based on a small number of relatively simple datasets. Therefore, no general applicative conclusions can be drawn. We therefore adopted an unbiased protocol to perform a fair comparison of state of the art multivariate and univariate gene selection techniques, in combination with a range of classifiers. Our conclusions are based on seven gene expression datasets, across many cancer types. Surprisingly, we could not detect any significant improvement of multivariate feature selection techniques over univariate approaches. We speculate on the possible causes of this finding, ranging from the small sample size problem to the particular nature of the multivariate gene dependencies.