By Topic

A Syllable Lattice Approach to Speaker Verification

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

3 Author(s)
Minho Jin ; Dept. of Electr. Eng. & Comput. Sci., Korea Adv. Inst. of Sci. & Technol., Daejeon ; Soong, F.K. ; Yoo, C.D.

This paper proposes a syllable-lattice-based speaker verification algorithm for Mandarin Chinese input. For each speech utterance, a syllable lattice is generated with a speaker-independent large-vocabulary continuous speech recognition system in free syllable decoding. The verification decision is made based upon the likelihood ratio between a target-speaker model and a speaker-independent background model, computed on the decoded syllable lattice. The likelihood function is calculated efficiently in a forward algorithm by considering all paths in the lattice. The proposed algorithm was evaluated using a Mandarin Chinese database, where 1832 true and 26 250 impostor trials were recorded by 19 target speakers and 180 impostors. The average duration of each trial is 2 s long without silence. The target-speaker model was adapted from the speaker-independent background model using enrollment data of two minutes with silence. The proposed algorithm achieved an equal-error rate of 0.857% which is better than 1.21% of the hidden Markov model-based speaker verification algorithm without using syllable lattices. The equal-error rate was further reduced to 0.617% by incorporating the Goussian mixture model-universal background model algorithm with 2048 Gaussian kernels whose equal error rate is 0.990%.

Published in:

Audio, Speech, and Language Processing, IEEE Transactions on  (Volume:15 ,  Issue: 8 )