By Topic

Noise compensation for speech recognition with arbitrary additive noise

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$33 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

1 Author(s)
Ji Ming ; Sch. of Comput. Sci., Queen's Univ. Belfast, UK

This paper investigates speech recognition involving additive background noise, assuming no knowledge about the noise characteristics. A new method, namely universal compensation (UC), is proposed as a solution to the problem. The UC method is an extension of the missing-feature method, i.e., recognition based only on reliable data but robust to any corruption type, including full corruption in which the noise affects all time-frequency components of the speech representation. The UC technique achieves robustness to unknown, full noise corruption through a novel combination of the multicondition training method and the missing-feature method. Multicondition training is employed to convert fullband spectral corruption into partial-band spectral corruption, which is achieved by training the model using data involving simulated wide-band noise at different signal-to-noise ratios. The missing-feature principle is employed to reduce the effect of the remaining partial-band corruption on recognition by basing the recognition only on the matched or compensated spectral components from the multicondition training. The combination of these two strategies makes the new method potentially capable of dealing with arbitrary additive noise-with arbitrary temporal-spectral characteristics-based only on clean speech training data and simulated noise data, without requiring knowledge of the actual noise. Two databases, Aurora 2 and an E-set word database, have been used to evaluate the UC method. Experiments on Aurora 2 indicate that the new model has the potential to achieve a recognition performance close to the performance obtained by a multicondition baseline model trained using data involving the test environments. Further experiments for noise conditions unseen in Aurora 2 show significant performance improvement for the new model over the multicondition model. The experimental results on the E-set database demonstrate the ability of the UC model to deal with acoustically confusing recognition tasks.

Published in:

IEEE Transactions on Audio, Speech, and Language Processing  (Volume:14 ,  Issue: 3 )