Ensemble Machine Methods for DNA Binding
- Already Purchased? View Article
- Subscription Options Learn More
We introduce three ensemble machine learning methods for analysis of biological DNA binding by transcription factors (TFs). The goal is to identify both TF target genes and their binding motifs. Subspace-valued weak learners (formed from an ensemble of different motif finding algorithms) combine candidate motifs as probability weight matrices (PWM), which are then translated into subspaces of a DNA k-mer (string) feature space. Assessing and then integrating highly informative subspaces by machine methods gives more reliable target classification and motif prediction. We compare these target identification methods with probability weight matrix (PWM) rescanning and use of support vector machines on the full k-mer space of the yeast S. cerevisiae. This method, SVMotif-PWM, can significantly improve accuracy in computational identification of TF targets. The software is publicly available at http://cagt10.bu.edu/SVMotif .
Published in:
Machine Learning and Applications, 2008. ICMLA '08. Seventh International Conference on
Date of Conference: 11-13 Dec. 2008