Skip to Main Content
Finding the multiple longest common subsequence (MLCS) is an important problem in the areas of bioinformatics and computational genomics. Approaches that are more efficient than the standard dynamic programming method have been introduced and successfully parallelized for the special cases of 2 sequences. However, the increasing complexity and size of biological data require an efficient method applicable to an arbitrary number of sequences as well as its efficient parallelization. A recently developed dominant points method for a general MLCS problem has been shown a significant performance improvement over the dynamic programming method, when number of sequences is larger than two. At the same time, the approach has revealed strong demand for its parallelization, in order to be applied to the larger families of sequences or sequences of the greater lengths. In this paper, we introduce an efficient parallel algorithm to find a MLCS for an arbitrary number of sequences, which is based on the dominant points method. When the number of processors is not greater than the size of alphabet multiplied by the number of sequences, the parallel algorithm is estimated to have the asymptotically linear speed up. We experimentally tested the algorithm using sets of randomly generated sequences over different alphabets as well as the protein sequences from a family of homologous proteins. We found that the performance of the algorithm increases with the number of input sequences and reaches a near-linear speedup for eight sequences.