By Topic

IEEE 7th BIBE Research Tutorial Lecture: Decoding Novel Genomes: From Microbiomes to the Eukaryota

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$33 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

1 Author(s)
Mark Borodovsky ; Department of Biomedical Engineering and Division of Computational Science, Georgia Institute of Technology, Atlanta, Georgia, USA

One of the main goals of computational genomics is fast and accurate biological interpretation of newly sequenced genomic DNA. The complexity of the task varies among genomes but is never simple. Currently, for a new genome a custom built annotation pipeline is constructed by integration of ab initio and comparative genomic methods. Still, a consistent solution of the jigsaw puzzle of genome annotation frequently requires additional experimental efforts (such as EST/cDNA sequencing, etc.) Current ab initio gene finding algorithms use statistical analysis and optimization to solve the gene identification problem restated as search for the optimal parse of the genomic sequence into fragments with distinct statistical characteristics. This problem setting leads to a classic task for dynamic programming: search for an optimal path through a network with weights/scores assigned to nodes and vertices. Obviously, assignment of weights/scores plays a critical role and may present a significant challenge. This task is equivalent to estimation of parameters of statistical models (hidden Markov models) representing a mosaic of functional sequences and sites in a given genome. The task is rather easy when large sets of validated training sequences are available. However, it is not the case for hundreds of currently unfolding genome sequencing and annotation projects. In the lecture we will consider the general schemes of ab initio gene prediction. We will discuss estimation of model parameters without a training set. We will show that this unsupervised approach is possible and is becoming very important for two rapidly developing branches of genomics: i/ for prokaryotic metagenomes that are becoming a rich source of information about non-cultivated microbial species and ii/ for "compact" eukaryotic genomes, such as fungi, which relatively short genome size (less than 50 MB) allows to obtain complete genome sequence in a relatively short time.

Published in:

2007 IEEE 7th International Symposium on BioInformatics and BioEngineering

Date of Conference:

14-17 Oct. 2007