The authors describe a speaker verification system for telephone channels based on randomly prompted digit strings and using concatenated context-dependent phonemic hidden Markov models (HMMs). The main goal of this work was to achieve acceptable speaker verification performance while keeping the number of parameters (and, consequently, the amount of training material) as well as the CPU requirements relatively small. To optimize the performance of this system, several features have been used: context-dependent phoneme models; silence and garbage (click) models to take extraneous parts out of the actual utterance; better decision logic, based on associated speakers; better feature vectors using RASTA processing; and rejection of garbage utterances without significantly affecting the overall verification performance. It is shown how these features together led to an average equal error rate of 6.3% on realistic and difficult tasks.<
Published in:
Acoustics, Speech, and Signal Processing, 1993. ICASSP-93., 1993 IEEE International Conference on
(Volume:2
)
Date of Conference: 27-30 April 1993