Scheduled System Maintenance:
On Monday, April 27th, IEEE Xplore will undergo scheduled maintenance from 1:00 PM - 3:00 PM ET (17:00 - 19:00 UTC). No interruption in service is anticipated.
By Topic

An acoustic-phonetic-based speaker adaptation technique for improving speaker-independent continuous speech recognition

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

1 Author(s)
Zhao, Y. ; Speech Technol. Lab., Panasonic Technol. Inc., Santa Barbara, CA, USA

A new speaker adaptation technique is proposed for improving speaker-independent continuous speech recognition based on a decomposition of spectral variation sources. In this technique, the spectral variations are separated into two categories, one acoustic and the other phone-specific, where each variation source is modeled by a linear transformation system. The technique consists of two sequential steps: first, acoustic normalization is performed, and second, phone model parameters are adapted. Experiments of speaker adaptation on the TIMIT database using short calibration speech (5 s per speaker) have shown significant performance improvement over the baseline speaker-independent continuous speech recognition, where the recognition system uses Gaussian mixture density based hidden Markov models of phone units. For a vocabulary size of 853 and test set perplexity of 104, the recognition word accuracy has been improved from 86.9% for the baseline system to 90.5% after adaptation, corresponding to an error reduction of 27.5%. On a more difficult test set that contains an additional variation source due to recording channel mismatch, a more significant performance improvement has been obtained: for the same vocabulary and a test set perplexity of 101, the recognition word accuracy has been improved from 65.4% for the baseline to 86.0% after adaptation, corresponding to an error reduction of 59.5%

Published in:

Speech and Audio Processing, IEEE Transactions on  (Volume:2 ,  Issue: 3 )