The recognition of a total of 408 very confusing Mandarin syllables is very difficult because this vocabulary consists of 38 confusing sets, each of which can have as many as 19 syllables. The recognition of these 408 syllables becomes even more difficult when only very limited training data are available. A special direct-concatenation approach for training hidden Markov models (HMMs) to recognize these syllables with very limited training data is developed in which each syllable is divided into INITIAL and FINAL parts and 408 right-context-dependent INITIAL HMMs and 38 left-context-independent FINAL HMMs are separately trained and the transition region carefully taken account of, and then these INITIAL and FINAL HMMs are directly concatenated to form syllable recognition. Experimental results show that this approach can utilize the very limited training data most efficiently and provide significant improvements in recognition performance. Although the results are obtained for Mandarin syllables, the approach is believed to be equally helpful for the recognition of other confusing vocabularies
Published in:
Speech and Audio Processing, IEEE Transactions on
(Volume:1
,
Issue:
1
)
Date of Publication: Jan 1993