By Topic

Strategies to Improve the Robustness of Agglomerative Hierarchical Clustering Under Data Source Variation for Speaker Diarization

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

3 Author(s)
Han, K.J. ; Dept. of Electr. Eng., Univ. of Southern California, Los Angeles, CA ; Kim, S. ; Narayanan, S.S.

Many current state-of-the-art speaker diarization systems exploit agglomerative hierarchical clustering (AHC) as their speaker clustering strategy, due to its simple processing structure and acceptable level of performance. However, AHC is known to suffer from performance robustness under data source variation. In this paper, we address this problem. We specifically focus on the issues associated with the widely used clustering stopping method based on Bayesian information criterion (BIC) and the merging-cluster selection scheme based on generalized likelihood ratio (GLR). First, we propose a novel alternative stopping method for AHC based on information change rate (ICR). Through experiments on several meeting corpora, the proposed method is demonstrated to be more robust to data source variation than the BIC-based one. The average improvement obtained in diarization error rate (DER) by this method is 8.76% (absolute) or 35.77% (relative). We also introduce a selective AHC (SAHC) in the paper, which first runs AHC with the ICR-based stopping method only on speech segments longer than 3 s and then classifies shorter speech segments into one of the clusters given by the initial AHC. This modified version of AHC is motivated by our previous analysis that the proportion of short speech turns (or segments) in a data source is a significant factor contributing to the robustness problem arising in the GLR-based merging-cluster selection scheme. The additional performance improvement obtained by SAHC is 3.45% (absolute) or 14.08% (relative) in terms of averaged DER.

Published in:

Audio, Speech, and Language Processing, IEEE Transactions on  (Volume:16 ,  Issue: 8 )