Abstract:
Identifying operons at the whole genome scale of microbial organisms can facilitate deciphering of transcriptional regulation, biological networks and pathways. A number ...Show MoreMetadata
Abstract:
Identifying operons at the whole genome scale of microbial organisms can facilitate deciphering of transcriptional regulation, biological networks and pathways. A number of computational methods, such as naive Bayesian and neural network approaches, have been employed for operon prediction to whole genome sequences of a number of prokaryotic organisms, based on features known to be associated with operons, such as intergenic distance, microarray expression data, phylogenetic profiles, clusters of orthologous groups (COG). In this paper, we introduce a decision tree approach to predict operon structures using three effective types of genomic data: intergenic distance, gene order conservation and COG. We calculated and analyzed frequency distributions of each attribute of known operons and non-operons of Escherichia coli (E. coli) K12 and Bacillus subtilis (R subtilis) 168, and constructed decision trees based on training examples to predict operons. The overall prediction accuracy is 94.1% for E. coli K12 and 91.0% for B. subtilis 168. We also applied four other classifiers, logistic regression, naive Bayesian, neural network and support vector machines on both organisms. The results indicate that the decision tree approach is the best classifier for operon prediction. The software package operonDT is freely available at http://www.cs.uga.edn/~che/OperonT
Published in: 2007 IEEE Symposium on Computational Intelligence and Bioinformatics and Computational Biology
Date of Conference: 01-05 April 2007
Date Added to IEEE Xplore: 04 June 2007
Print ISBN:1-4244-0710-9
Related Articles are not available for this document.