Skip to Main Content
The objective of this paper is to demonstrate the effectiveness of sparse representation techniques for speaker recognition. In this approach, each feature vector from unknown utterance is expressed as linear weighted sum of a dictionary of feature vectors belonging to many speakers. The weights associated with feature vectors in the dictionary are evaluated using orthogonal matching pursuit algorithm, which is a greedy approximation to l0 optimization. The weights thus obtained exhibit high level of sparsity, and only a few of them will have nonzero values. The feature vectors which belong to the correct speaker carry significant weights. The proposed method gives an equal error rate (EER) of 10.84% on NIST-2003 database, whereas the existing GMM-UBM system gives an EER of 9.67%. By combining evidence from both the systems an EER of 8.15% is achieved, indicating that both the systems carry complimentary information.
Date of Conference: 25-30 March 2012