The design of a robust language-learning system, intended to help students practice a foreign language along with a machine tutor, must provide for localization of common pronunciation errors. This paper presents a new technique for unsupervised detection of phone-level mispronunciations, created with language-learning applications in mind. Our method uses multiple hidden-articulator Markov models to asynchronously classify acoustic events in various articulatory domains. It requires no human input besides a pronunciation dictionary for all words in the end system's vocabulary, and has been shown to perform as well as a human tutor would, given the same task. For the majority of systematic mispronunciations investigated in this study, precision in detecting the presence of an error exceeded the 70% inter-annotator agreement reported by our test corpus
Published in:
Automatic Speech Recognition and Understanding, 2005 IEEE Workshop on
Date of Conference: 27-27 Nov. 2005