Skip to Main Content
Entropy-based divergence measures have shown promising results in many areas of engineering and image processing. In this study, we use the Jensen-Shannon and Jensen-Renyi divergence measures for recursive segmentation of DNA sequences in order to find borders between coding and noncoding regions. Heterogeneous DNA sequences that are comprised of the four nucleotides A, C, G, and T and the stop codons can be partitioned into homogeneous domains. We introduce a new 18 symbol alphabet that captures: (i) the differential base composition in codons, and (ii) the differential stop codon composition along three phases in both DNA strands. For two entire genomes of bacteria our results obtained using the new approach, based on Jensen-Renyi divergence and the new 18 symbol alphabet, are more accurate than those obtained using the standard approach, based on Jensen-Shannon divergence, when searching for borders between coding and noncoding DNA regions.
Statistical Signal Processing, 2003 IEEE Workshop on
Date of Conference: 28 Sept.-1 Oct. 2003