Abstract:
Speaker identification for overlapped speech presents a great challenge for speaker diarization tasks in meeting scenarios. In order to overcome such challenges, several ...Show MoreMetadata
Abstract:
Speaker identification for overlapped speech presents a great challenge for speaker diarization tasks in meeting scenarios. In order to overcome such challenges, several overlap-aware resegmentation methods based on deep learning have been integrated into speaker diarization systems. In this paper we propose two multi-channel diarization systems which have enhanced capability in detecting overlapped speech and identify speakers via learning spatial features. The first system applies a multi-look strategy to train networks without given the speakers’ direction of arrival(DOA), and the other system estimates the DOA of target speakers based on existing diarization results. Both systems aim to estimate the voice activity of speakers in different directions to handle overlapped speech. Experimental results on the AMI corpus show that the relative improvements of both systems can reach 9.4% and 18.1% in term of diarization error rate (DER) against an overlap-aware single-channel system with a BeamformIt front-end.
Published in: ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Date of Conference: 23-27 May 2022
Date Added to IEEE Xplore: 27 April 2022
ISBN Information: