By Topic

Speaker Clustering and Cluster Purification Methods for RT07 and RT09 Evaluation Meeting Data

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$33 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

4 Author(s)
Tin Lay Nwe ; Human Language Technology Department, Institute for Infocomm Research, A*STAR, Singapore ; Hanwu Sun ; Bin Ma ; Haizhou Li

This paper presents a design strategy for the speaker diarization system in the IIR submissions to the 2007 and 2009 NIST Rich Transcription Meeting Recognition Evaluations (RT07 and RT09) for the multiple distant microphone (MDM) condition. The system features two algorithms supporting two important steps in a diarization process. The first step is Initial Segmentation and Clustering (ISC), and the second one is cluster merging and purification. In the ISC step, we propose a histogram quantization and clustering technique based on time delay of arrival (TDOA) features by analyzing the correlation among the signals across multiple distant microphones. In the cluster merging and purification step, we further merge the speaker clusters using a Bayesian information criterion (BIC) to consolidate the clusters to arrive at one-cluster-per-speaker. The two steps work in tandem to form an integral process. We propose a novel Consensus Based Cluster Purification (CBCP) method that involves a technique to remove impure speaker segments in the speaker clusters before speaker modeling in the cluster purification process. The system reports a state-of-the-art performance of speaker diarization for RT07 and RT09 MDM condition with 7.47% and 8.77% Diarization error rates (DERs), respectively, for both overlapping and non-overlapping speech.

Published in:

IEEE Transactions on Audio, Speech, and Language Processing  (Volume:20 ,  Issue: 2 )