By Topic

Acoustic Factor Analysis for Robust Speaker Verification

Sign In

Full text access may be available.

To access full text, please use your member or institutional sign in.

The purchase and pricing options are temporarily unavailable. Please try again later.
2 Author(s)
Hasan, T. ; Center for Robust Speech Syst. (CRSS), Univ. of Texas at Dallas, Richardson, TX, USA ; Hansen, J.H.L.

Factor analysis based channel mismatch compensation methods for speaker recognition are based on the assumption that speaker/utterance dependent Gaussian Mixture Model (GMM) mean super-vectors can be constrained to reside in a lower dimensional subspace. This approach does not consider the fact that conventional acoustic feature vectors also reside in a lower dimensional manifold of the feature space, when feature covariance matrices contain close to zero eigenvalues. In this study, based on observations of the covariance structure of acoustic features, we propose a factor analysis modeling scheme in the acoustic feature space instead of the super-vector space and derive a mixture dependent feature transformation. We demonstrate how this single linear transformation performs feature dimensionality reduction, de-correlation, normalization and enhancement, at once. The proposed transformation is shown to be closely related to signal subspace based speech enhancement schemes. In contrast to traditional front-end mixture dependent feature transformations, where feature alignment is performed using the highest scoring mixture, the proposed transformation is integrated within the speaker recognition system using a probabilistic feature alignment technique, which nullifies the need for regenerating the features/retraining the Universal Background Model (UBM). Incorporating the proposed method with a state-of-the-art i-vector and Gaussian Probabilistic Linear Discriminant Analysis (PLDA) framework, we perform evaluations on National Institute of Science and Technology (NIST) Speaker Recognition Evaluation (SRE) 2010 core telephone and microphone tasks. The experimental results demonstrate the superiority of the proposed scheme compared to both full-covariance and diagonal covariance UBM based systems. Simple equal-weight fusion of baseline and proposed systems also yield significant performance gains.

Published in:

Audio, Speech, and Language Processing, IEEE Transactions on  (Volume:21 ,  Issue: 4 )
Biometrics Compendium, IEEE