Skip to Main Content
Support vector machine-based speaker verification (SV) has become a standard approach in recent years. These systems typically use dynamic kernels to handle the dynamic nature of the speech utterances. This paper shows that many of these kernels fall into one of two general classes, derivative and parametric kernels. The attributes of these classes are contrasted and the conditions under which the two forms of kernel are identical are described. By avoiding these conditions, gains may be obtained by combining derivative and parametric kernels. One combination strategy is to combine at the kernel level. This paper describes a maximum-margin-based scheme for learning kernel weights for the SV task. Various dynamic kernels and combinations were evaluated on the NIST 2002 SRE task, including derivative and parametric kernels based upon different model structures. The best overall performance was 7.78% EER achieved when combining five kernels.
Audio, Speech, and Language Processing, IEEE Transactions on (Volume:17 , Issue: 4 )
Date of Publication: May 2009