Skip to Main Content
Feature extraction and feature selection have become an apparent need in many bioinformatics applications. In this paper, the features are extracted from protein primary single sequence database, based on amino acid composition and k-mer patterns or k-tuples and then feature selection is carried out from the extracted features. Since the rough QuickReduct is not yet applied for protein sequence data set, the enhanced QuickReduct feature selection (EQRFS) algorithm using fuzzy-rough set is proposed. Rough sets theory deals with uncertainty and vagueness of an information system in data mining. Fuzzy-rough based feature selection provides a means by which discrete or real-valued noisy data or a mixture of both can be effectively reduced. The experiments are carried out on protein primary single sequence data sets which are derived from PDB on SCOP classification, based on the structural class predictions such as all alpha, all beta, all alpha+beta and alpha/beta.