Abstract:
Prediction of promoter regions continues to be a challenging subproblem in mapping out eukaryotic DNA. While this task is key to understanding the regulation of different...Show MoreMetadata
Abstract:
Prediction of promoter regions continues to be a challenging subproblem in mapping out eukaryotic DNA. While this task is key to understanding the regulation of differential transcription, the gene-specific architecture of promoter sequences does not readily lend itself to general strategies. To date, the best approaches are based on Support Vector Machines (SVMs) that employ standard "spectrum" features and achieve promoter region classification accuracies from a low of 84% to a high of 94% depending on the particular species involved. In this paper, we propose a general and powerful methodology that uses Genetic Programming (GP) techniques to generate more complex and more gene-specific features to be used with a standard SVM for promoter region identification. We evaluate our methodology on three data sets from different species and observe consistent classification accuracies in the 94 95% range. In addition, because the GP-generated features are gene-specific, they can be used by biologists to advance their understanding of the architecture of eukaryotic promoter regions.
Published in: 2011 IEEE Congress of Evolutionary Computation (CEC)
Date of Conference: 05-08 June 2011
Date Added to IEEE Xplore: 14 July 2011
ISBN Information:
ISSN Information:
Keywords assist with retrieval of results and provide a means to discovering other relevant content. Learn more.
- IEEE Keywords
- Index Terms
- General Characteristics ,
- Eukaryotic Promoter ,
- Promoter Region ,
- Support Vector Machine ,
- Promoter Sequence ,
- Eukaryotic DNA ,
- Regional Architecture ,
- Standard Support Vector Machine ,
- False Positive Rate ,
- Homo Sapiens ,
- Fitness Function ,
- Support Vector Machine Classifier ,
- Feature Subset ,
- Support Vector Machine Model ,
- Positive Sequence ,
- Negative Sequence ,
- Top Features ,
- Feature Selection Techniques ,
- Genetic Operators ,
- Support Vector Machine Training ,
- Hall Of Fame ,
- Plant Dataset ,
- Negative Instances ,
- Negative Training ,
- Rest Of The Features ,
- Different Sets Of Features ,
- Surrogate Function ,
- Tournament Selection ,
- Interesting Feature ,
- Test Dataset
- Author Keywords
Keywords assist with retrieval of results and provide a means to discovering other relevant content. Learn more.
- IEEE Keywords
- Index Terms
- General Characteristics ,
- Eukaryotic Promoter ,
- Promoter Region ,
- Support Vector Machine ,
- Promoter Sequence ,
- Eukaryotic DNA ,
- Regional Architecture ,
- Standard Support Vector Machine ,
- False Positive Rate ,
- Homo Sapiens ,
- Fitness Function ,
- Support Vector Machine Classifier ,
- Feature Subset ,
- Support Vector Machine Model ,
- Positive Sequence ,
- Negative Sequence ,
- Top Features ,
- Feature Selection Techniques ,
- Genetic Operators ,
- Support Vector Machine Training ,
- Hall Of Fame ,
- Plant Dataset ,
- Negative Instances ,
- Negative Training ,
- Rest Of The Features ,
- Different Sets Of Features ,
- Surrogate Function ,
- Tournament Selection ,
- Interesting Feature ,
- Test Dataset
- Author Keywords