Abstract:
In teleconferencing scenarios, the speech is usually deteriorated by the background noises, thereby reducing the speech intelligibility and quality. Therefore, it is esse...Show MoreMetadata
Abstract:
In teleconferencing scenarios, the speech is usually deteriorated by the background noises, thereby reducing the speech intelligibility and quality. Therefore, it is essential to enhance speech in the noisy environments. In this paper, we study a far-field real-time speech enhancement method based on improved recurrent neural network (RNN) with gated recurrent unit (GRU). The ideal amplitude masking values of the reverberant target speech are used as the training targets of the RNN. We also employ the feature normalization and the proposed sub-band normalization technology to reduce the feature differences, which further facilitate the RNN learning long-term patterns. Meanwhile, in order to further suppress the residual inter-harmonic pseudo-stationary noise due to sub-band division, we integrate the RNN with the optimally modified log-spectral amplitude (OMLSA) algorithm. Experimental results show that the proposed method improves the speech quality and decreases distortion with a low computational complexity for real-time operation.
Published in: 2021 IEEE International Conference on Signal Processing, Communications and Computing (ICSPCC)
Date of Conference: 17-19 August 2021
Date Added to IEEE Xplore: 25 October 2021
ISBN Information: