By Topic

Comparative study of ensemble learning approaches in the identification of disease mutations

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

3 Author(s)
Jiaxin Wu ; Dept. of Autom., Tsinghua Univ., Beijing, China ; Wangshu Zhang ; Rui Jiang

With the accelerating advancement of biomedical research, it has been widely accepted that genetic variation plays a critical role in the pathogenesis of human inherited diseases. As an important type of genetic variation, nonsynonymous single nucleotide polymorphisms (nsSNPs) that occur in protein coding regions lead to amino acid substitutions in proteins, affecting structures and functions of proteins, and potentially causing human diseases. Hence, identifying disease-associated nsSNPs against neutral ones by machine learning approaches plays an important role in the understanding of genetic bases of human diseases and further promoting the prevention, diagnosis, and treatment of these diseases. In this paper, we formulate the task of identifying disease-associated nsSNPs as a binary classification problem. Based on a set of 26 numeric features derived from protein sequence information, we compare the performance of five popular ensemble learning approaches (AdaBoost, LogitBoost, Random forests, L2 boosting and stochastic gradient regression) with two traditional classification methods (decision trees and support vector machines) in this classification problem. Systematic validation demonstrates that ensemble learning approaches are in general more effective in identifying the disease-associated nsSNPs, while LogitBoost can achieve the highest performance among all the methods compared.

Published in:

Biomedical Engineering and Informatics (BMEI), 2010 3rd International Conference on  (Volume:6 )

Date of Conference:

16-18 Oct. 2010