Abstract:
Biological sequences can contain regions of unusual composition, e.g., proteins contain DNA binding domains, transmembrane regions, and charged regions. The linear-time R...Show MoreMetadata
Abstract:
Biological sequences can contain regions of unusual composition, e.g., proteins contain DNA binding domains, transmembrane regions, and charged regions. The linear-time Ruzzo-Tompa algorithm finds such regions by inputting a sequence of scores and outputting the corresponding “maximal segments”, i.e., contiguous, disjoint subsequences having the greatest total scores. Just as gaps improved the sensitivity of BLAST searches, they might improve the sensitivity of searches for regions of unusual composition as well. Accordingly, we generalize the Ruzzo-Tompa algorithm from sequences of scores to paths in weighted, directed graphs on a one-dimensional lattice. Within the generalization, unfavorable scores can be deleted from contiguous, disjoint subsequences by paying a penalty, and the Ruzzo-Tompa algorithm can then find gapped subsequences having the greatest total gapped scores. An application to finding gapped inexact repeats in biological sequences exemplifies some of the concepts.
Published in: 2012 IEEE 2nd International Conference on Computational Advances in Bio and medical Sciences (ICCABS)
Date of Conference: 23-25 February 2012
Date Added to IEEE Xplore: 12 April 2012
ISBN Information: