By Topic

Statistical Analysis of Minimum Classification Error Learning for Gaussian and Hidden Markov Model Classifiers

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

3 Author(s)
Afify, M. ; IBM T. J. Watson Res. Center, Yorktown Heights, NY ; Xinwei Li ; Hui Jiang

Minimum classification error learning realized via generalized probabilistic descent, usually referred to as (MCE/GPD), is a very popular and powerful framework for building classifiers. This paper first presents a theoretical analysis of MCE/GPD. The focus is on a simple classification problem for estimating the means of two Gaussian classes. For this simple algorithm, we derive difference equations for the class means and decision threshold during learning, and develop closed form expressions for the evolution of both the smoothed and true error. In addition, we show that the decision threshold converges to its optimal value, and provide an estimate of the number of iterations needed to approach convergence. After convergence the class means drift towards increasing their distance to infinity without contributing to the decrease of the classification error. This behavior, referred to as mean drift, is then related to the increase of the variance of the classifier. The theoretical results perfectly agree with simulations carried out for a two-class Gaussian classification problem. In addition to the obtained theoretical results we experimentally verify, in speech recognition experiments, that MCE/GPD learning of Gaussian mixture hidden Markov models qualitatively follows the pattern suggested by the theoretical analysis. We also discuss links between MCE/GPD learning and both batch gradient descent and extended Baum-Welch re-estimation. The latter two approaches are known to be popular in large scale implementations of discriminative training. Hence, the proposed analysis can be used, at least as a rough guideline, for better understanding of the properties of discriminative training algorithms for speech recognition.

Published in:

Audio, Speech, and Language Processing, IEEE Transactions on  (Volume:15 ,  Issue: 8 )