Skip to Main Content
For data miners, bioinformatics pose a most demanding challenge than only creating efficient algorithms. They should work with databases that are more "horizontal" than "vertical", as the data consist of a few samples of a large (sometimes huge) number of attributes in the case of micro-arrays. More important is the fact that there is a priori biological knowledge saying that only a few genes are normally linked to each characteristic exhibited by the individual. It allows one to use Attribute Selection to determine which attributes are more likely to induce the observable characteristic. In this paper a study on many configurations of attribute selection schemes is made on two typical bioinformatics datasets. The results show that sequential subset generation guarantees better results and reiterates the use of the wrapper approach to achieve better classification, despite its running time being larger than the filter approach.