Close category search window
 

Exploring speaker-specific characteristics with deep learning

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

2 Author(s)
Salman, A. ; Sch. of Comput. Sci., Univ. of Manchester, Manchester, UK ; Ke Chen

Speech signals convey different types of information which vary from linguistic to speaker-specific and should be used in different tasks. However, it is hard to extract a special type of information such that nearly all acoustic representations of speech present all kinds of information as a whole. The use of the same representation in different tasks creates a difficulty in achieving good performance in either speech or speaker recognition. In this paper, we present a deep neural architecture to explore speaker-specific characteristics from popular Mel-frequency cepstral coefficients. For learning, we propose an objective function consisting of contrastive cost in terms of speaker similarity and dissimilarity as well as data reconstruction cost used as regularization to normalize non-speaker related information. Learning deep architecture is done by a greedy layerwise local unsupervised training for initialization and a global supervised discriminative training for extracting a speaker-specific representation. By means of two narrow-band benchmark corpora, we demonstrate that our deep architecture generates a robust overcomplete speech representation in characterizing various speakers and the use of this new representation yields a favorite performance in speaker verification.

Published in:
Neural Networks (IJCNN), The 2011 International Joint Conference on

Date of Conference: July 31 2011-Aug. 5 2011

Need Help?


IEEE Advancing Technology for Humanity About IEEE Xplore | Contact | Help | Terms of Use | Nondiscrimination Policy | Site Map | Privacy & Opting Out of Cookies

A not-for-profit organization, IEEE is the world's largest professional association for the advancement of technology.
© Copyright 2013 IEEE - All rights reserved. Use of this web site signifies your agreement to the terms and conditions.