Skip to Main Content
With the wide application of hidden Markov models (HMMs) in speech recognition, a statistical acoustic confusability metric is of increasing importance to many components of a speech recognition system. Although distance metrics between HMMs have been studied in the past, they didn't include a way of accounting for speaking rate and durational variations. In order to account for the underlying speech signal's properties when computing such a metric between HMMs, we propose a dynamically-aligned Kullback Leibler (KL) divergence measurement and discuss a cost-efficient implementation of the metric. The proposed approach outperforms existing metrics in predicting phonemic confusions.