By Topic

Large-Scale Distributed Language Modeling

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$33 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

3 Author(s)
Ahmad Emami ; IBM T J Watson Research Center, 1101 Kitchawan Rd., Yorktown Heights, NY 10598. ; Kishore Papineni ; Jeffrey Sorensen

A novel distributed language model that has no constraints on the n-gram order and no practical constraints on vocabulary size is presented. This model is scalable and allows for an arbitrarily large corpus to be queried for statistical estimates. Our distributed model is capable of producing n-gram counts on demand. By using a novel heuristic estimate for the interpolation weights of a linearly interpolated model, it is possible to dynamically compute the language model probabilities. The distributed architecture follows the client-server paradigm and allows for each client to request an arbitrary weighted mixture of the corpus. This allows easy adaptation of the language model to particular test conditions. Experiments using the distributed LM for re-ranking N-best lists of a speech recognition system resulted in considerable improvements in word error rate (WER), while integration with a machine translation decoder resulted in significant improvements in translation quality as measured by the BLEU score.

Published in:

2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07  (Volume:4 )

Date of Conference:

15-20 April 2007