Skip to Main Content
Gaussian mixture models (GMMs) are commonly used as the output density function for large-vocabulary continuous speech recognition (LVCSR) systems. A standard problem when using multivariate GMMs to classify data is how to accurately represent the correlations in the feature vector. Full covariance matrices yield a good model, but dramatically increase the number of model parameters. Hence, diagonal covariance matrices are commonly used. Structured precision matrix approximations provide an alternative, flexible, and compact representation. Schemes in this category include the extended maximum likelihood linear transform and subspace for precision and mean models. This paper examines how these precision matrix models can be discriminatively trained and used on state-of-the-art speech recognition tasks. In particular, the use of the minimum phone error criterion is investigated. Implementation issues associated with building LVCSR systems are also addressed. These models are evaluated and compared using large vocabulary continuous telephone speech and broadcast news English tasks.