Abstract:
Speaker verification is usually performed by comparing the likelihood score of the target speaker model to the likelihood score of an universal background model (UBM), an...Show MoreMetadata
Abstract:
Speaker verification is usually performed by comparing the likelihood score of the target speaker model to the likelihood score of an universal background model (UBM), and then applying a suitable threshold. For the UBM to be effective, it must be estimated from a large number of speakers. However, it is not always possible to gather enough data to estimate a robust UBM, and the verification performance may degrade if impostors, or whatever sources that generate the input signals, were not suitably modelled by the UBM. In this work, a new normalization technique is proposed, based on a shallow source model (SSM) estimated from the input utterance. A linear combination of the likelihood scores of the SSM and the UBM is used to normalize the speaker score. Speaker verification experiments were carried out on a clean-speech dataset including 204 speakers. Also, a sizeable amount of noisy, speech and non-speech signals was used to test the robustness to large training-test mismatch. Three normalization techniques were tested: UBM, smoothed UBM and the proposed combination of UBM and SSM. This latter approach yielded the best performance. The difference in performance was specially significant in the large training-test mismatch condition.
Published in: 2008 16th European Signal Processing Conference
Date of Conference: 25-29 August 2008
Date Added to IEEE Xplore: 06 April 2015
Print ISSN: 2219-5491
Conference Location: Lausanne, Switzerland