Skip to Main Content
We present a novel approach for reducing the computational complexity of updating homologies produced by a wide class of popular state-of-the-art algorithms in comparative computational biology. The algorithms that we consider use hidden Markov models (HMMs) and a Viterbi recursion to evaluate matches between sequences, or between a sequence and models. Such updates occur frequently in practice as researchers discover errors in biological sequences or analyze multiple nearly similar sequences, e.g., in a family of proteins that underwent mutations during evolution. The proposed algorithm interprets the Viterbi recursion as an update of an optimal minimum spanning tree in a shortest path problem. We propose the novel concept of a relative node tolerance bound and show how it can be used to guarantee that one or more partial subtrees of a minimum spanning tree obtained before encountering the perturbations remain optimal. We also describe how to compute and use in real-time the relative node tolerance bounds to skip most unperturbed parts of a sequence while computing the new optimal solution. To further reduce the computational overhead associated with the tolerance bound evaluation, we present and exploit a statistical analysis of the matching procedure that estimates how many columns in the dynamic program that corresponds to the matching problem are affected by a change in a preceding column. The resulting "reusable" Viterbi decoding algorithm can update a matching result in less than a third to a fifth of the time required to compute a new match by performing a normal matching procedure, i.e., running a Viterbi algorithm with updated sequences against a base hidden Markov model.