Abstract:
This paper investigates robust speaker localization at the frame level on the basis of complex spectral mapping, which is capable of learning both the magnitude and phase...Show MoreMetadata
Abstract:
This paper investigates robust speaker localization at the frame level on the basis of complex spectral mapping, which is capable of learning both the magnitude and phase of the target signal. Unlike prevailing deep learning methods for speaker localization, we perform MIMO (multi-input multi-output) based multi-channel speech enhancement first and then localize the enhanced speaker using weighted generalized cross correlation. In addition, we propose new multi-channel loss functions that incorporate phase differences in order to preserve inter-channel phase relations, which is key to accurate sound localization. Systematic evaluations using simulated and recorded room impulse responses demonstrate that the proposed model yields excellent frame-level speaker localization results in reverberant and noisy environments and outperforms related methods by a large margin, even surpassing their utterance-level results.
Published in: ICASSP 2025 - 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Date of Conference: 06-11 April 2025
Date Added to IEEE Xplore: 07 March 2025
ISBN Information: