Skip to Main Content
Edit distance was originally developed by Levenstein several decades ago to measure the distance between two strings. It was found that this distance can be computed by an elegant dynamic programming procedure. The edit distance has played important roles in a wide array of applications due to its representational efficacy and computational efficiency. To effect a more reasonable distance measure, the normalized edit distance was proposed. Many algorithms and studies have been dedicated along this line with impressive performances in recent years. There is, however, a fundamental problem with the original definition of edit distance that has remained elusive: its context-free nature. In determining the possible actions, i.e., insertion, deletion, and substitution, no systematic consideration was given to the local behaviors of the string/pattern in question that indeed encompass great amount of useful information regarding its content. In this paper, inspired by the success of the Markov Random Field theory, a new edit distance called Markov edit distance (MED) within the dynamic programming framework is proposed to take full advantage of the local statistical dependencies in the pattern in order to arrive at enhanced matching performance. Within this framework, two specialized distance measures are developed: The reshuffling MED to handle cases where a subpattern in the target pattern is the reshuffles of that in the source pattern, and the coherence MED which is able to incur local content based substitution, insertion, and deletion. The applications based on these two MEDs in string matching are then explored, whereof encouraging empirical results have been observed.