Recent research suggests that modeling coarticulation in speech is more appropriate at the syllable level. However, due to a number of additional factors that affect the way syllables are articulated, creating multiple paths through syllable models might be necessary. Our previous research on longer-length multi-path models in connected digit recognition has proved trajectory clustering to be an attractive approach to deriving multi-path models. In this paper, we extend our research to large vocabulary continuous speech recognition by deriving trajectory clusters for 94 very frequent syllables in a 20-hour data set of Dutch read speech. The resulting clusters are compared with a knowledge-based classification. The comparison results suggest that multi-path models for syllables are difficult to build based on phonetic and linguistic knowledge. When multi-path models based on trajectory clustering are used, speech recognition performance improves significantly. Thus, it is concluded that data-driven trajectory clustering is a very effective approach to developing multi-path models
Published in:
Acoustics, Speech and Signal Processing, 2006. ICASSP 2006 Proceedings. 2006 IEEE International Conference on
(Volume:1
)
Date of Conference: 14-19 May 2006