By Topic

Source-Normalized LDA for Robust Speaker Recognition Using i-Vectors From Multiple Speech Sources

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

2 Author(s)
McLaren, M. ; Centre for Language & Speech Technol., Radboud Univ. Nijmegen, Nijmegen, Netherlands ; van Leeuwen, D.

The recent development of the i-vector framework for speaker recognition has set a new performance standard in the research field. An i-vector is a compact representation of a speakers utterance extracted from a total variability subspace. Prior to classification using a cosine kernel, i-vectors are projected into an linear discriminant analysis (LDA) space in order to reduce inter-session variability and enhance speaker discrimination. The accurate estimation of this LDA space from a training dataset is crucial to detection performance. A typical training dataset, however, does not consist of utterances acquired through all sources of interest for each speaker. This has the effect of introducing systematic variation related to the speech source in the between-speaker covariance matrix and results in an incomplete representation of the within-speaker scatter matrix used for LDA. The recently proposed source-normalized (SN) LDA algorithm improves the robustness of i-vector-based speaker recognition under both mis-matched evaluation conditions and conditions for which inadequate speech resources are available for suitable system development. When evaluated on the recent NIST 2008 and 2010 Speaker Recognition Evaluations (SRE), SN-LDA demonstrated relative improvements of up to 38% in equal error rate (EER) and 44% in minimum DCF over LDA under mis-matched and sparsely resourced evaluation conditions while also providing improvements in the common telephone-only conditions. Extending on these initial developments, this study provides a thorough analysis of how SN-LDA transforms the i-vector space to reduce source variation and its robustness to varying evaluation and LDA training conditions. The concept of source-normalization is further extended to within-class covariance normalization (WCCN) and data-driven source detection.

Published in:

Audio, Speech, and Language Processing, IEEE Transactions on  (Volume:20 ,  Issue: 3 )
Biometrics Compendium, IEEE