Many existing length based Chinese-English sentence alignment methods compute sentence length in terms of the number of bytes. In this paper, we examine the effectiveness of six different ways of sentence length computation, which take, respectively, the number of verbs, nouns, adjectives, content words, bytes and all words in a sentence as its length. Most previous methods are found memory consuming and inefficient. This paper proposes an alignment method to save memory and time via grouping sentence for alignment. Our experimental results show that taking all words into account in the sentence length computation can further enhance alignment performance, giving 99.01% precision and 99.5% recall, respectively.
Published in:
Fuzzy Systems and Knowledge Discovery, 2008. FSKD '08. Fifth International Conference on
(Volume:4
)
Date of Conference: 18-20 Oct. 2008