By Topic

Improving Statistical Machine Translation Using Bayesian Word Alignment and Gibbs Sampling

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$33 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

3 Author(s)
Coşkun Mermer ; TÜBİTAK BİLGEM, Kocaeli, Turkey ; Murat Saraclar ; Ruhi Sarikaya

We present a Bayesian approach to word alignment inference in IBM Models 1 and 2. In the original approach, word translation probabilities (i.e., model parameters) are estimated using the expectation-maximization (EM) algorithm. In the proposed approach, they are random variables with a prior and are integrated out during inference. We use Gibbs sampling to infer the word alignment posteriors. The inferred word alignments are compared against EM and variational Bayes (VB) inference in terms of their end-to-end translation performance on several language pairs and types of corpora up to 15 million sentence pairs. We show that Bayesian inference outperforms both EM and VB in the majority of test cases. Further analysis reveals that the proposed method effectively addresses the high-fertility rare word problem in EM and unaligned rare word problem in VB, achieves higher agreement and vocabulary coverage rates than both, and leads to smaller phrase tables.

Published in:

IEEE Transactions on Audio, Speech, and Language Processing  (Volume:21 ,  Issue: 5 )