By Topic

Bayesian Separation With Sparsity Promotion in Perceptual Wavelet Domain for Speech Enhancement and Hybrid Speech Recognition

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$33 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

2 Author(s)
Yu Shao ; Software Exploration R&D Center, Bureau of Geophysical Prospecting Inc., China National Petroleum Corporation, Zhuozhou, China ; Chip-Hong Chang

Speech recognition accuracy can be improved by the removal of noise. However, errors in the estimated signal components can also obscure the recognition. This paper presents a framework of wavelet-based techniques to harness the automatic speech recognition performance in the presence of background noise. The proposed robust speech recognition system is realized by implementing speech enhancement preprocessing, feature extraction, and a hybrid speech recognizer in the time-frequency space. A perceptual wavelet filterbank using a fixed base to imitate the human perceptual modus of speech is developed to capture the most discriminative information in the time-frequency plane. To minimize the mismatch between the training and testing conditions of the classifier, a Bayesian scheme is applied in a wavelet domain to separate the speech and noise components in the proposed iterative speech enhancement algorithm. The nonphonetic information is discarded while the more critical speech features are extracted and represented by the wavelet coefficients. The denoised wavelet features are fed to the hybrid classifier founded on a hidden Markov model (HMM). The intrinsic limitation of the HMM is overcome by augmenting it with a wavelet support vector machine. This hybrid and hierarchical design paradigm improves the recognition performance by combining the advantages of different methods into an integral system. The continuous digit speech recognition experiments conducted with the proposed framework show promising results. It significantly improves the recognition performance at a low signal-to-noise ratio (SNR) without causing a poorer performance at a high SNR.

Published in:

IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans  (Volume:41 ,  Issue: 2 )