Loading [MathJax]/extensions/MathMenu.js
Cleanformer: A Multichannel Array Configuration-Invariant Neural Enhancement Frontend for ASR in Smart Speakers | IEEE Conference Publication | IEEE Xplore

Cleanformer: A Multichannel Array Configuration-Invariant Neural Enhancement Frontend for ASR in Smart Speakers


Abstract:

This work introduces Cleanformer —a streaming multichannel neural enhancement frontend for automatic speech recognition (ASR). This model has a Conformer-based architectu...Show More

Abstract:

This work introduces Cleanformer —a streaming multichannel neural enhancement frontend for automatic speech recognition (ASR). This model has a Conformer-based architecture which takes as inputs a single channel each of raw and enhanced signals, and uses self-attention to derive a time-frequency mask. The enhanced input is generated by a multichannel adaptive noise cancellation algorithm known as Speech Cleaner. The time-frequency mask is applied to the noisy input to produce enhanced features for ASR. Detailed evaluations are presented with speech- and non-speech-based noise that show significant reduction in word error rate (WER) – about 80% for -6 dB SNR – over a state-of-the-art ASR model alone. It also significantly outperforms enhancement using a beamformer with ideal steering. The enhancement model can be used with different microphone arrays without the need for retraining.
Date of Conference: 04-10 June 2023
Date Added to IEEE Xplore: 05 May 2023
ISBN Information:

ISSN Information:

Conference Location: Rhodes Island, Greece

Contact IEEE to Subscribe

References

References is not available for this document.