By Topic

Random Forests for Prediction of DNA-Binding Residues in Protein Sequences Using Evolutionary Information

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

1 Author(s)
Liangjiang Wang ; Dept. of Genetics & Biochem., Clemson Univ., Clemson, SC, USA

A new machine learning approach has been developed in this study for sequence-based prediction of DNA-binding residues in proteins. The approach used both the labeled data instances collected from the available structures of protein-DNA complexes and the abundant unlabeled data found in protein sequence databases. The evolutionary information contained in the unlabeled sequence data was represented as position-specific scoring matrices and several new descriptors. The sequence-derived features were used to train random forests, which could handle a large number of input variables and avoid model overfitting. The use of evolutionary information was found to significantly improve classifier performance. The RF classifier was further evaluated using a separate test dataset. The results suggest that the RF-based approach gives rise to more accurate prediction of DNA-binding residues than previous studies.

Published in:

Future Generation Communication and Networking, 2008. FGCN '08. Second International Conference on  (Volume:3 )

Date of Conference:

13-15 Dec. 2008