Loading [a11y]/accessibility-menu.js
Robust Frame-level Speaker Localization in Reverberant and Noisy Environments by Exploiting Phase Difference Losses | IEEE Conference Publication | IEEE Xplore

Robust Frame-level Speaker Localization in Reverberant and Noisy Environments by Exploiting Phase Difference Losses


Abstract:

This paper investigates robust speaker localization at the frame level on the basis of complex spectral mapping, which is capable of learning both the magnitude and phase...Show More

Abstract:

This paper investigates robust speaker localization at the frame level on the basis of complex spectral mapping, which is capable of learning both the magnitude and phase of the target signal. Unlike prevailing deep learning methods for speaker localization, we perform MIMO (multi-input multi-output) based multi-channel speech enhancement first and then localize the enhanced speaker using weighted generalized cross correlation. In addition, we propose new multi-channel loss functions that incorporate phase differences in order to preserve inter-channel phase relations, which is key to accurate sound localization. Systematic evaluations using simulated and recorded room impulse responses demonstrate that the proposed model yields excellent frame-level speaker localization results in reverberant and noisy environments and outperforms related methods by a large margin, even surpassing their utterance-level results.
Date of Conference: 06-11 April 2025
Date Added to IEEE Xplore: 07 March 2025
ISBN Information:

ISSN Information:

Conference Location: Hyderabad, India

Funding Agency:


Contact IEEE to Subscribe

References

References is not available for this document.