Cart (Loading....) | Create Account
Close category search window
 

GMM adaptation based online speaker segmentation for spoken document retrieval

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

3 Author(s)
Kyungmi Park ; Dept. of Comput. Sci., Korea Adv. Inst. of Sci. & Technol. (KAIST), Daejeon, South Korea ; Jeong-sik Park ; Yung-Hwan Oh

This paper proposes an online speaker segmentation approach based on Gaussian Mixture Model (GMM) adaptation for spoken document retrieval. In the conventional approach using the Bayesian Information Criterion (BIC), two single Gaussian models are respectively constructed for two divided speech streams in an analysis window, and the dissimilarity between the two models is estimated according to the BIC principle. This approach has been widely applied to speaker segmentation. However, its performance may deteriorate when speakers change frequently, since the single Gaussian model hardly represent the speaker's explicit characteristics for short speech data. To overcome this limitation, we propose an approach to use adapted GMMs instead of single Gaussian models. The method proposed herein constructs a local UBM for speech in an analysis window and adapts the local UBM to each of two divided speech streams in the same window. Upon the two adapted GMMs obtained from the adaptation, the likelihood of the respective speech stream is estimated and change of speaker is determined according to our criterion based on local maxima of BIC. On speaker segmentation experiments based on HUB4, a well-known broadcast news corpus, the proposed method exhibited superior performance compared to the conventional approaches.

Published in:

Consumer Electronics, IEEE Transactions on  (Volume:56 ,  Issue: 2 )

Date of Publication:

May 2010

Need Help?


IEEE Advancing Technology for Humanity About IEEE Xplore | Contact | Help | Terms of Use | Nondiscrimination Policy | Site Map | Privacy & Opting Out of Cookies

A not-for-profit organization, IEEE is the world's largest professional association for the advancement of technology.
© Copyright 2014 IEEE - All rights reserved. Use of this web site signifies your agreement to the terms and conditions.