Abstract
Identification of short repeated patterns (motifs) in genomic sequences is the key to many problems in bioinformatics. The promoter regions of genes are an important target of search for such motifs (transcription factor binding sites). We present a new algorithm, Mortice, for detecting potential binding sites which are present in a given set of genomic sequences. An informed search is performed by organizing the input patterns and their variants in a graph. Such a strategy efficiently leads to the desired solutions. The background is modeled as a Markov process and a composite score function is used. We demonstrate the performance of our algorithm by testing it on real-life data sets from yeast and human promoter sequences. We compared the performance with several popular algorithms and found that other algorithms work well with lower organisms like yeast but only a couple of them work well with human data. We show that our algorithm scales linearly with the size of input dataset. We compare the computational efficiency of our algorithm with other algorithms and show that it performs faster for different datasets and motif sizes
Index
Terms
Available to subscribers and IEEE members.
References
Available to subscribers and IEEE members.
Citing Documents
Available to subscribers and IEEE members.