Abstract:
We investigate the effectiveness of generative adversarial networks (GANs) for speech enhancement, in the context of improving noise robustness of automatic speech recogn...Show MoreMetadata
Abstract:
We investigate the effectiveness of generative adversarial networks (GANs) for speech enhancement, in the context of improving noise robustness of automatic speech recognition (ASR) systems. Prior work [1] demonstrates that GANs can effectively suppress additive noise in raw waveform speech signals, improving perceptual quality metrics; however this technique was not justified in the context of ASR. In this work, we conduct a detailed study to measure the effectiveness of GANs in enhancing speech contaminated by both additive and reverberant noise. Motivated by recent advances in image processing [2], we propose operating GANs on log-Mel filterbank spectra instead of waveforms, which requires less computation and is more robust to reverberant noise. While GAN enhancement improves the performance of a clean-trained ASR system on noisy speech, it falls short of the performance achieved by conventional multi-style training (MTR). By appending the GAN-enhanced features to the noisy inputs and retraining, we achieve a 7% WER improvement relative to the MTR system.
Published in: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Date of Conference: 15-20 April 2018
Date Added to IEEE Xplore: 13 September 2018
ISBN Information:
Electronic ISSN: 2379-190X
Keywords assist with retrieval of results and provide a means to discovering other relevant content. Learn more.
- IEEE Keywords
- Index Terms
- Speech Recognition ,
- Generative Adversarial Networks ,
- Speech Enhancement ,
- Additive Noise ,
- Reverberation ,
- Quality Metrics ,
- Advanced Image Processing ,
- Automatic Speech Recognition System ,
- Convolutional Layers ,
- Time Domain ,
- Feature Maps ,
- Power Spectral Density ,
- Batch Normalization ,
- Noise Sources ,
- Hybrid Model ,
- Presence Of Noise ,
- Image Synthesis ,
- Latent Vector ,
- Wall Street Journal ,
- Monaural ,
- Clear Speech ,
- Speech Data
- Author Keywords
Keywords assist with retrieval of results and provide a means to discovering other relevant content. Learn more.
- IEEE Keywords
- Index Terms
- Speech Recognition ,
- Generative Adversarial Networks ,
- Speech Enhancement ,
- Additive Noise ,
- Reverberation ,
- Quality Metrics ,
- Advanced Image Processing ,
- Automatic Speech Recognition System ,
- Convolutional Layers ,
- Time Domain ,
- Feature Maps ,
- Power Spectral Density ,
- Batch Normalization ,
- Noise Sources ,
- Hybrid Model ,
- Presence Of Noise ,
- Image Synthesis ,
- Latent Vector ,
- Wall Street Journal ,
- Monaural ,
- Clear Speech ,
- Speech Data
- Author Keywords