By Topic

Feature selection and classification of protein subfamilies using Rough Sets

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$33 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

3 Author(s)
Shuzlina Abdul Rahman ; Department of Science and System Management, Faculty of Information Systems and Technology, Universiti Kebangsaan Malaysia, 43600 UKM Bangi, Selangor, Malaysia ; Azuraliza Abu Bakar ; Zeti Azura Mohamed Hussein

Machine learning methods are known to be inefficient when faced with many features that are unnecessary for rule discovery. In coping with this issue, many methods have been proposed for selecting important features. Among them is feature selection that selects a subset of discriminative features or attribute for model building due to its ability to avoid overfitting issue, improve model performance, provide faster and producing reliable model. This paper proposes a new method based on rough set algorithms, which is a rule-based data mining method to select the important features in bioinformatics datasets. Amino acid compositions are used as conditional features for the classification task. However, our results indicate that all amino acid composition features are equally important thus selecting the features are unnecessary. We do confirm the need of having a balance classes in classifying the protein function by demonstrating an increase of more than 15% in accuracy.

Published in:

2009 International Conference on Electrical Engineering and Informatics  (Volume:01 )

Date of Conference:

5-7 Aug. 2009