By Topic

Data Balancing for Efficient Training of Hybrid ANN/HMM Automatic Speech Recognition Systems

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$33 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

4 Author(s)
Ana Isabel Garcia-Moral ; Signal Processing and Communications Department, University Carlos III Madrid, Leganés, Spain ; Rubén Solera-Urena ; Carmen Pelaez-Moreno ; Fernando Diaz-de-Maria

Hybrid speech recognizers, where the estimation of the emission pdf of the states of hidden Markov models (HMMs), usually carried out using Gaussian mixture models (GMMs), is substituted by artificial neural networks (ANNs) have several advantages over the classical systems. However, to obtain performance improvements, the computational requirements are heavily increased because of the need to train the ANN. Departing from the observation of the remarkable skewness of speech data, this paper proposes sifting out the training set and balancing the amount of samples per class. With this method, the training time has been reduced 18 times while obtaining performances similar to or even better than those with the whole database, especially in noisy environments. However, the application of these reduced sets is not straightforward. To avoid the mismatch between training and testing conditions created by the modification of the distribution of the training data, a proper scaling of the a posteriori probabilities obtained and a resizing of the context window need to be performed as demonstrated in this paper.

Published in:

IEEE Transactions on Audio, Speech, and Language Processing  (Volume:19 ,  Issue: 3 )