Skip to Main Content
We study the problem of finding higher order motifs under the levenshtein measure, otherwise known as the edit distance. In the problem set-up, we are given N sequences, each of average length n, over a finite alphabet Σ and thresholds D and q, we are to find composite motifs that contain motifs of length P (these motifs occur with almost D differences) in 1 ≤ q ≤ N distinct sequences. Two interesting but involved algorithms for finding higher order motifs under the edit distance was presented by Marsan and Sagot. Their second algorithm is much more complicated and its complexity is asymptotically not better. Their first algorithm runs in O(M · N2n1+α ·p · pow(ε)) where p ≥ 2, α > 0, pow(ε) is a concave function that is less than 1, ε= D/P and M is the expected number of all monad motifs. We present an alternative algorithmic approach also for Edit distance based on the concept described. The resulting algorithm is simpler and runs in O(N2n1+p · pow(ε)) expected time.