Journals & Magazines >IEEE Transactions on Audio, S... >Volume: 17 Issue: 4

Speaker Model Clustering for Efficient Speaker Identification in Large Population Applications

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

In large population speaker identification (SI) systems, likelihood computations between an unknown speaker's feature vectors and the registered speaker models can be ver...Show More

Metadata

Abstract:

In large population speaker identification (SI) systems, likelihood computations between an unknown speaker's feature vectors and the registered speaker models can be very time-consuming and impose a bottleneck. For applications requiring fast SI, this is a recognized problem and improvements in efficiency would be beneficial. In this paper, we propose a method whereby GMM-based speaker models are clustered using a simple k-means algorithm. Then, during the test stage, only a small proportion of speaker models in selected clusters are used in the likelihood computations resulting in a significant speed-up with little to no loss in accuracy. In general, as the number of selected clusters is reduced, the identification accuracy decreases; however, this loss can be controlled through proper tradeoff. The proposed method may also be combined with other test stage speed-up techniques resulting in even greater speed-up gains without additional sacrifices in accuracy.

Published in: IEEE Transactions on Audio, Speech, and Language Processing ( Volume: 17, Issue: 4, May 2009)

Page(s): 848 - 853

Date of Publication: 03 April 2009

ISSN Information:

DOI: 10.1109/TASL.2008.2010882

Contents

I. Introduction

The objective of speaker identification (SI) is to determine which voice sample from a set of known voice samples best matches the characteristics of an unknown input voice sample [1]. SI is a two-stage procedure consisting of training and testing. In the training stage, speaker-dependent feature vectors are extracted from a training speech signal and a speaker model is built for each speaker's feature vectors. Normally, SI systems use the Mel-frequency cepstral coefficients (MFCCs) as the feature vector and a Gaussian mixture model (GMM) of the feature vectors for the speaker model. The GMM is parameterized by the set where are the weights, are the mean vectors, and are the covariance matrices of the Gaussian component densities of the GMM. In the SI testing stage, feature vectors are extracted from a test signal (speaker unknown), scored against all speaker models using a log-likelihood calculation, and the most likely speaker identity decided according to

$\mathhat{s}=\arg\max_{1\leq s\leq S}\sum_{m=1}^{M^{\prime}}\log p\left({\bf x}_{m}^{\rm test}\vert\lambda_{s}\right).\eqno{\hbox{(1)}}$ In assessing an SI system, we measure identification accuracy as the number of correct identification tests divided by the total number of tests. For many years now, GMM-based systems have been shown to be very successful in accurately identifying speakers from a large population [1], [2].

References is not available for this document.

Speaker Model Clustering for Efficient Speaker Identification in Large Population Applications

Abstract:

Metadata

Abstract:

ISSN Information:

I. Introduction

References

IEEE Account

Purchase Details

Profile Information

Need Help?

Speaker Model Clustering for Efficient Speaker Identification in Large Population Applications

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

I. Introduction

Authors

Figures

References

Citations

Keywords

Metrics

References

IEEE Account

Purchase Details

Profile Information

Need Help?