This paper presents UT-scope data base, and automatic and perceptual an evaluation of Lombard speech in in-set speaker recognition. The speech used for the analysis forms...Show More
Metadata
Abstract:
This paper presents UT-scope data base, and automatic and perceptual an evaluation of Lombard speech in in-set speaker recognition. The speech used for the analysis forms a part of the UT-SCOPE database and consists of sentences from the well-known TIMIT corpus, spoken in the presence of highway, large crowd and pink noise. First, the deterioration of the EER of an in-set speaker identification system trained on neutral and tested with Lombard speech is illustrated. A clear demarcation between the effect of noise and Lombard effect on noise is also given by testing with noisy Lombard speech. The effect of test-token duration on system performance under the Lombard condition is addressed. We also report results from In-Set Speaker Recognition tasks performed by human subjects in comparison to the system performance. Overall observations suggest that deeper understanding of cognitive factor involved in perceptual speaker ID offers meaningful insights for further development of automated systems.
Automatic Speaker recognition [1] plays an important role in the area of forensics and security as well as in speech communication such as recognizing a speaker for an automatic speech recognition or dialogue system.