Deep Neural Networks for Single-Channel Multi-Talker Speech Recognition | IEEE Journals & Magazine | IEEE Xplore

Deep Neural Networks for Single-Channel Multi-Talker Speech Recognition


Abstract:

We investigate techniques based on deep neural networks (DNNs) for attacking the single-channel multi-talker speech recognition problem. Our proposed approach contains fi...Show More

Abstract:

We investigate techniques based on deep neural networks (DNNs) for attacking the single-channel multi-talker speech recognition problem. Our proposed approach contains five key ingredients: a multi-style training strategy on artificially mixed speech data, a separate DNN to estimate senone posterior probabilities of the louder and softer speakers at each frame, a weighted finite-state transducer (WFST)-based two-talker decoder to jointly estimate and correlate the speaker and speech, a speaker switching penalty estimated from the energy pattern change in the mixed-speech, and a confidence based system combination strategy. Experiments on the 2006 speech separation and recognition challenge task demonstrate that our proposed DNN-based system has remarkable noise robustness to the interference of a competing speaker. The best setup of our proposed systems achieves an average word error rate (WER) of 18.8% across different SNRs and outperforms the state-of-the-art IBM superhuman system by 2.8% absolute with fewer assumptions.
Page(s): 1670 - 1679
Date of Publication: 11 June 2015

ISSN Information:


Contact IEEE to Subscribe

References

References is not available for this document.