By Topic

Robust speech recognition training via duration and spectral-based stress token generation

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

2 Author(s)
Hansen, J.H.L. ; Dept. of Electr. Eng., Duke Univ., Durham, NC, USA ; Bou-Ghazale, S.E.

It is known that speech recognition performance degrades if systems are not trained and tested under similar speaking conditions. This is particularly true if a speaker is exposed to demanding workload stress or noise. For recognition systems to be successful in applications susceptible to stress, speech recognizers should address the adverse conditions experienced by the user. The authors consider the problem of improved recognition training for speech recognition for various stressed speaking conditions (e.g., slow, loud, and Lombard effect speaking styles). The main objective is to devise a training procedure that produces a hidden Markov model recognizer that better characterizes a given stressed speaking style, without the need for directly collecting such stressed data. The novel approach is to construct a word production model using a previously suggested source generator framework [Hansen 1994], by employing knowledge of the statistical nature of duration and spectral variation of speech under stress. This model is used in turn to produce simulated stressed speech training tokens from neutral speech tokens. The token generation training method is shown to improve isolated word recognition by 24% for Lombard speech when compared to a neutral trained isolated word recognizer. Further results are reported for isolated and keyword recognition scenarios

Published in:

Speech and Audio Processing, IEEE Transactions on  (Volume:3 ,  Issue: 5 )