Abstract:
Speech quality is often degraded by acoustic echoes, background noise, and reverberation. In this paper, we propose a system consisting of deep learning and signal proces...Show MoreMetadata
Abstract:
Speech quality is often degraded by acoustic echoes, background noise, and reverberation. In this paper, we propose a system consisting of deep learning and signal processing to simultaneously suppress echoes, noise, and reverberation. For the deep learning, we design a novel speech dense-prediction backbone. For the signal processing, a linear acoustic echo canceller is used as conditional information for deep learning. To improve the performance of the speech dense-prediction backbone, strategies such as a microphone and reference phase encoder, multi-scale time-frequency processing, and streaming axial attention are designed. The proposed system ranked first in both AEC and DNS Challenge (non-personal track) of ICASSP 2022. In addition, this backbone has also been extended to the multi-channel speech enhancement task, and placed second in ICASSP 2022 L3DAS22 Challenge1.
Published in: ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Date of Conference: 23-27 May 2022
Date Added to IEEE Xplore: 27 April 2022
ISBN Information: