By Topic

A multispan language modeling framework for large vocabulary speech recognition

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$33 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

1 Author(s)
J. R. Bellegarda ; Spoken Language Group, Apple Comput. Inc., Cupertino, CA, USA

A new framework is proposed to construct multispan language models for large vocabulary speech recognition, by exploiting both local and global constraints present in the language. While statistical n-gram modeling can readily take local constraints into account, global constraints have been more difficult to handle within a data-driven formalism. In this work, they are captured via a paradigm first formulated in the context of information retrieval, called latent semantic analysis (LSA). This paradigm seeks to automatically uncover the salient semantic relationships between words and documents in a given corpus. Such discovery relies on a parsimonious vector representation of each word and each document in a suitable, common vector space. Since in this space familiar clustering techniques can be applied, it becomes possible to derive several families of large-span language models, with various smoothing properties. Because of their semantic nature, the new language models are well suited to complement conventional, more syntactically oriented n-grams, and the combination of the two paradigms naturally yields the benefit of a multispan context. An integrative formulation is proposed for this purpose, in which the latent semantic information is used to adjust the standard n-gram probability. The performance of the resulting multispan language models, as measured by perplexity, compares favorably with the corresponding n-gram performance

Published in:

IEEE Transactions on Speech and Audio Processing  (Volume:6 ,  Issue: 5 )