Skip to Main Content
Tandem repeats are ubiquitous sequence features in both prokaryotic and eukaryotic genomes. They are known to cause several inherited neurological diseases in humans. Identifying these patterns is a highly computation-intensive process. Previous parallel implementations use straightforward domain decomposition based on existing sequential algorithms and rely on parallel machines with low-latency interconnection network and fast hardware support for processor synchronization. Our research exploits the superior cost effectiveness and flexibility achieved through low-cost clusters to speed up biological computations by designing communication-efficient parallel algorithms for pattern identification. This paper presents a low communication-overhead parallel algorithm for pattern identification in biological sequences. Given a biological sequence of length n and a pattern of length m, we conclude an algorithm with five computation/communication phases, each requiring O(n) computation time and only O(p) message units. The low communication overhead of the algorithm is essential in achieving reasonable speedups on clusters, where the interprocessor communication latency is usually higher.