Loading [MathJax]/jax/element/mml/optable/BasicLatin.js
Speaker Model Clustering for Efficient Speaker Identification in Large Population Applications | IEEE Journals & Magazine | IEEE Xplore

Speaker Model Clustering for Efficient Speaker Identification in Large Population Applications


Abstract:

In large population speaker identification (SI) systems, likelihood computations between an unknown speaker's feature vectors and the registered speaker models can be ver...Show More

Abstract:

In large population speaker identification (SI) systems, likelihood computations between an unknown speaker's feature vectors and the registered speaker models can be very time-consuming and impose a bottleneck. For applications requiring fast SI, this is a recognized problem and improvements in efficiency would be beneficial. In this paper, we propose a method whereby GMM-based speaker models are clustered using a simple k-means algorithm. Then, during the test stage, only a small proportion of speaker models in selected clusters are used in the likelihood computations resulting in a significant speed-up with little to no loss in accuracy. In general, as the number of selected clusters is reduced, the identification accuracy decreases; however, this loss can be controlled through proper tradeoff. The proposed method may also be combined with other test stage speed-up techniques resulting in even greater speed-up gains without additional sacrifices in accuracy.
Page(s): 848 - 853
Date of Publication: 03 April 2009

ISSN Information:


I. Introduction

The objective of speaker identification (SI) is to determine which voice sample from a set of known voice samples best matches the characteristics of an unknown input voice sample [1]. SI is a two-stage procedure consisting of training and testing. In the training stage, speaker-dependent feature vectors are extracted from a training speech signal and a speaker model is built for each speaker's feature vectors. Normally, SI systems use the Mel-frequency cepstral coefficients (MFCCs) as the feature vector and a Gaussian mixture model (GMM) of the feature vectors for the speaker model. The GMM is parameterized by the set where are the weights, are the mean vectors, and are the covariance matrices of the Gaussian component densities of the GMM. In the SI testing stage, feature vectors are extracted from a test signal (speaker unknown), scored against all speaker models using a log-likelihood calculation, and the most likely speaker identity decided according to \mathhat{s}=\arg\max_{1\leq s\leq S}\sum_{m=1}^{M^{\prime}}\log p\left({\bf x}_{m}^{\rm test}\vert\lambda_{s}\right).\eqno{\hbox{(1)}}

In assessing an SI system, we measure identification accuracy as the number of correct identification tests divided by the total number of tests. For many years now, GMM-based systems have been shown to be very successful in accurately identifying speakers from a large population [1], [2].

Contact IEEE to Subscribe

References

References is not available for this document.