Conferences >ICASSP 2021 - 2021 IEEE Inter...

End To End Learning For Convolutive Multi-Channel Wiener Filtering

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

In this paper, we propose a dereverberation and speech source separation method based on deep neural network (DNN). Unlike the cascade connection of dereverberation and s...Show More

Metadata

Abstract:

In this paper, we propose a dereverberation and speech source separation method based on deep neural network (DNN). Unlike the cascade connection of dereverberation and speech source separation, the proposed method performs dereverberation and speech source separation jointly by a unified convolutive multi-channel Wiener filtering (CMWF). The proposed method adopts a time-varying CMWF to achieve more dereverberation and separation performance than a time-invariant CMWF. The time-varying CMWF requires time-frequency masks and time-frequency activities. These variables are inferred via a unified DNN. The DNN is trained to optimize the output signal of the time-varying CMWF with a loss function based on a negative log-posterior probability density function. We also reveal that the time-varying CMWF can be obtained efficiently based on the Sherman-Morrison-Woodbury equation. Experimental results show that the proposed time-varying CMWF can separate speech sources under reverberant environments better than the cascade-connection based method and the time-invariant CMWF.

Published in: ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Date of Conference: 06-11 June 2021

Date Added to IEEE Xplore: 13 May 2021

ISBN Information:

ISSN Information:

DOI: 10.1109/ICASSP39728.2021.9414907

Conference Location: Toronto, ON, Canada

No metrics found for this document.

Contents

1. INTRODUCTION

Speech source separation and dereverberation [1]–[3] are fundamental techniques in automatic speech recognition(ASR) and teleconferencing systems. Dereverberation techniques based on statistical modeling have been actively studied, e.g., Weighted Prediction Error (WPE) [4]. Simultaneous optimization of speech source separation and dereverberation has been also actively studied based on statistical modeling [5]–[8]. These techniques rely on speech source models based on super-Gaussian distributions, e.g., Laplacian distribution [9],[10] and time-varying Gaussian distribution [11]. However, the expression capability of these speech source models is not enough for expressing a complicated speech source spectrum. Recently, a deep neural network (DNN) is utilized for expressing a complicated speech source spectrum [12]–[20]. The expression capability of the DNN based speech source model is higher than that of statistical models.

Usage

Select a Year

View as

Total usage sinceMay 2021:323

Year Total:7

Data is updated monthly. Usage includes PDF downloads and HTML views.

Citations

Crossref^®

Scopus^®

Web
of Science^®

Search for
Citations in
Google Scholar^®

References is not available for this document.

End To End Learning For Convolutive Multi-Channel Wiener Filtering

Abstract:

Metadata

Abstract:

ISSN Information:

1. INTRODUCTION

View as

References

IEEE Account

Purchase Details

Profile Information

Need Help?

End To End Learning For Convolutive Multi-Channel Wiener Filtering

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

1. INTRODUCTION

Authors

Figures

References

Citations

Keywords

Metrics

View as

References

IEEE Account

Purchase Details

Profile Information

Need Help?