Skip to Main Content
It has been shown that DNA sequences can be modeled with autoregressive processes and that the Euclidean distance between model parameters is useful for detecting sequence similarity. But, the measure's robustness to nonexact, approximate matches is not explored. We go one step further and not only look at exact gene searching, but how the AR distance measures are perturbed by errors and mutation. To achieve higher accuracy in similarity searching, we compare the performance of the Euclidean distance measure to Itakura distance measure using different nucleotide mappings. The numerical mappings and distance measures have comparable performance, but in general, the Euclidean distance using the binary SW mapping distinguishes perfect matches the best. Finally, we show that it is possible to use AR measures to detect mutation-prone approximate matches by increasing the AR model order.