By Topic

Using out-of-domain data to improve in-domain language models

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$33 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

3 Author(s)
R. Iyer ; Coll. of Eng., Boston Univ., MA, USA ; M. Ostendorf ; H. Gish

Standard statistical language modeling techniques suffer from sparse data problems when applied to real tasks in speech recognition, where large amounts of domain-dependent text are not available. We investigate new approaches to improve sparse application-specific language models by combining domain dependent and out-of-domain data, including a back-off scheme that effectively leads to context-dependent multiple interpolation weights, and a likelihood-based similarity weighting scheme to discriminatively use data to train a task-specific language model. Experiments with both approaches on a spontaneous speech recognition task (switchboard), lead to reduced word error rate over a domain-specific n-gram language model, giving a larger gain than that obtained with previous brute-force data combination approaches.

Published in:

IEEE Signal Processing Letters  (Volume:4 ,  Issue: 8 )