Cart (Loading....) | Create Account
Close category search window
 

Optimization Techniques to Improve Training Speed of Deep Neural Networks for Large Speech Tasks

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

4 Author(s)
Sainath, T.N. ; IBM T. J. Watson Res. Center, Yorktown Heights, NY, USA ; Kingsbury, B. ; Soltau, H. ; Ramabhadran, B.

While Deep Neural Networks (DNNs) have achieved tremendous success for large vocabulary continuous speech recognition (LVCSR) tasks, training these networks is slow. Even to date, the most common approach to train DNNs is via stochastic gradient descent, serially on one machine. Serial training, coupled with the large number of training parameters (i.e., 10-50 million) and speech data set sizes (i.e., 20-100 million training points) makes DNN training very slow for LVCSR tasks. In this work, we explore a variety of different optimization techniques to improve DNN training speed. This includes parallelization of the gradient computation during cross-entropy and sequence training, as well as reducing the number of parameters in the network using a low-rank matrix factorization. Applying the proposed optimization techniques, we show that DNN training can be sped up by a factor of 3 on a 50-hour English Broadcast News (BN) task with no loss in accuracy. Furthermore, using the proposed techniques, we are able to train DNNs on a 300-hr Switchboard (SWB) task and a 400-hr English BN task, showing improvements between 9-30% relative over a state-of-the art GMM/HMM system while the number of parameters of the DNN is smaller than the GMM/HMM system.

Published in:

Audio, Speech, and Language Processing, IEEE Transactions on  (Volume:21 ,  Issue: 11 )

Date of Publication:

Nov. 2013

Need Help?


IEEE Advancing Technology for Humanity About IEEE Xplore | Contact | Help | Terms of Use | Nondiscrimination Policy | Site Map | Privacy & Opting Out of Cookies

A not-for-profit organization, IEEE is the world's largest professional association for the advancement of technology.
© Copyright 2014 IEEE - All rights reserved. Use of this web site signifies your agreement to the terms and conditions.