Skip to Main Content
Word alignment for a parallel corpus is the connection between the words/phrases in source language and the words/phrases in target language. The alignment result is an important input for many natural language processing applications. In this paper, we propose an approach to improve the English-Vietnamese word alignment result by using the alignment frequency that is presented in the translation model of SMT (Statistical Machine Translation). We also indicate 5 common error types of English-Vietnamese word alignment and propose the heuristic patterns to discover the alignment errors. The experimental results show the improvement compared to the result of GIZA++.