By Topic

On the effects of speech rate in large vocabulary speech recognition systems

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

2 Author(s)
Siegler, M.A. ; Dept. of Electr. & Comput. Eng., Carnegie Mellon Univ., Pittsburgh, PA, USA ; Stern, R.M.

It is well known that a higher-than-normal speech rate will cause the rate of recognition errors in large vocabulary automatic speech recognition (ASR) systems to increase. In this paper we attempt to identify and correct for errors due to fast speech. We first suggest that phone rate is a more meaningful measure of speech rate than the more common word rate. We find that when data sets are clustered according to the phone rate metric, recognition errors increase when the phone rate is more than 1 standard deviation greater than the mean. We propose three methods to improve the recognition accuracy of fast speech, each addressing different aspects of performance degradation. The first method is an implementation of Baum-Welch codebook adaptation. The second method is based on the adaptation of HMM state-transition probabilities. In the third method, the pronunciation dictionaries are modified using rule-based techniques and compound words are added. We compare improvements in recognition accuracy for each method using data sets clustered according to the phone rate metric. Adaptation of the HMM state-transition probabilities to fast speech improves recognition of fast speech by a relative amount of 4 to 6 percent

Published in:

Acoustics, Speech, and Signal Processing, 1995. ICASSP-95., 1995 International Conference on  (Volume:1 )

Date of Conference:

9-12 May 1995