By Topic

Information theoretic factorization of speaker and language in hidden Markov models, with application to speaker recognition

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

1 Author(s)
Tishby, Naftali ; AT&T Bell Labs., Murray Hill, NJ, USA

An information theoretic approach to speech modeling with prior statistical knowledge is proposed. Using the concept of minimum discrimination information (MDI), a model of speech can be factored into a prior distribution and an exponential correction term, depending on the specific training data. The discrimination information measures the statistical deviations of the training data from a prior model, in a way that is known to be optimal in a well defined sense. The minimization of the discrimination information, subject to the given training data as constraints, yields a set of Lagrange multipliers. These multipliers serve to characterize the part of the training data which is not described by the prior model. The problem of separating the speaker dependent part from a `universal' speaker independent prior in hidden Markov models is studied in this framework and a practical method for achieving this separation is derived. As an example, universal hidden Markov priors for isolated English digits are trained for male and female speakers using a database of 100 speakers and 20000 spoken digits. The speaker specific part is modeled by the individual Lagrange multipliers obtained by minimizing the discrimination information between the training data and the corresponding prior language model

Published in:

Acoustics, Speech, and Signal Processing, 1988. ICASSP-88., 1988 International Conference on

Date of Conference:

11-14 Apr 1988