An Integrated CNN-GRU Framework for Complex Ratio Mask Estimation in Speech Enhancement | IEEE Conference Publication | IEEE Xplore

An Integrated CNN-GRU Framework for Complex Ratio Mask Estimation in Speech Enhancement


Abstract:

In this paper, we propose a novel neural network-based speech enhancement approach, where a convolutional neural network (CNN) and a gated recurrent unit (GRU) are integr...Show More

Abstract:

In this paper, we propose a novel neural network-based speech enhancement approach, where a convolutional neural network (CNN) and a gated recurrent unit (GRU) are integrated to estimate a modified complex ratio mask (MCRM.) The new CNN structure comprised of frequency dilated convolution layers is employed to extract speech features while benefiting from the global contextual information of input speech. The CNN incorporates the skip connection and residual learning techniques to facilitate the training and accelerate the convergence. The GRU network is exploited to map the CNN-extracted features to the MCRM, which is used to enhance both magnitude and phase of the input speech. We compare the enhancement performance of the proposed method using features extracted by CNN with that of the GRU network using some conventional acoustic features, showing the advantage of the proposed CNN-GRU model. We also demonstrate that the GRU outperforms other recurrent neural network variations within the proposed model for mask estimation in terms of separated speech quality, memory footprint, and the number of model parameters in the presence of highly non-stationary noises.
Date of Conference: 07-10 December 2020
Date Added to IEEE Xplore: 31 December 2020
ISBN Information:

ISSN Information:

Conference Location: Auckland, New Zealand

References

References is not available for this document.