Skip to Main Content
In the recent years, Genome-Wide Association Study (GWAS) has been performed by many scientist around the world to find association between genetic profiles of different individuals with the risk of developing certain diseases. GWAS are performed using the Single Nucleotide Polymorphism (SNP) data which represents the genotypes of two different groups of individuals: the case group of individuals with the disease and the control group of individuals without the disease. The very high dimensional SNP data poses challenges in analyzing GWAS result. This issue can be tackled by performing feature ranking to remove non-relevant features for reducing the dimension of the original data. This work compares several feature ranking methods including the chi-square statistics, information gain, recursive feature elimination and Relief algorithm by analyzing the performance of different learning machines combined with the feature ranking. The highest performance is gained by combining recursive feature elimination with linear SVM while the worst performance is shown by the Relief algorithm. The experiments show that the classifiers generally benefit from the feature selection, but that the highest ranked features are not the best classifier.