Skip to Main Content
Automatic Speech Recognition (ASR) systems account for wide variability in the acoustic signal through large amounts of training data. From a linguistic point of view, the acoustic variability is a consequence of pronunciation variation. It is apparent that neither (i) any two speakers utter the same words exactly the same way nor (ii) an individual can repeat the same words with acoustic identity. Hence ASR systems usually rely on multiple-pronunciation lexicons to match an acoustic sequence with a lexical unit. In this study, we have adopted a data-driven approach to generate pronunciation variants at syllable level. Group-Delay (GD) segmentation algorithm is used to acquire acoustic cue about syllable boundaries, which are validated by a vowel-onset point (VOP) detection algorithm. Manual transcriptions of GD syllable segments are done to produce new pronunciation variants. Results on the TIMIT database show that some pronunciations are exclusive for a particular dialect.