By Topic

An optimal DNA segmentation based on the MDL principle

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

The purchase and pricing options are temporarily unavailable. Please try again later.
3 Author(s)
Szpankowski, W. ; Dept. of Comput. Sci., Purdue Univ., West Lafayette, IN, USA ; Ren, W. ; Szpankowski, Lukasz

The biological world is highly stochastic as well as inhomogeneous in its behavior. The transition between homogeneous and inhomogeneous regions of DNA, known also as change points, carry important biological information. Our goal is to employ rigorous methods of information theory to quantify structural properties of DNA sequences. In particular, we adopt the Stein-Ziv lemma to find asymptotically optimal discriminant function that determines whether two DNA segments are generated by the same source and assuring exponentially small false positives. Then we apply the minimum description length (MDL) principle to select parameters of our segmentation algorithm. Finally, we perform extensive experimental work on human chromosome 9. After grouping A and G (purines) and T and C (pyrimidines) we discover change points between coding and noncoding regions as well as the beginning of a CpG island.

Published in:

Bioinformatics Conference, 2003. CSB 2003. Proceedings of the 2003 IEEE

Date of Conference:

11-14 Aug. 2003