Skip to Main Content
In this paper, we present a normalized statistical metric space for hidden Markov models (HMMs). HMMs are widely used to model real-world systems. Like graph matching, some previous approaches compare HMMs by evaluating the correspondence, or goodness of match, between every pair of states, concentrating on the structure of the models instead of the statistics of the process being observed. To remedy this, we present a new metric space that compares the statistics of HMMs within a given level of statistical significance. Compared with the Kullback-Leibler divergence, which is another widely used approach for measuring model similarity, our approach is a true metric, can always return an appropriate distance value, and provides a confidence measure on the metric value. Experimental results are given for a sample application, which quantify the similarity of HMMs of network traffic in the Tor anonymization system. This application is interesting since it considers models extracted from a system that is intentionally trying to obfuscate its internal workings. In the conclusion, we discuss applications in less-challenging domains, such as data mining.