Close category search window
 

Bitext Dependency Parsing With Auto-Generated Bilingual Treebank

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

8 Author(s)
Wenliang Chen ; Dept. of Human Language Technol., Inst. for Infocomm Res., Singapore, Singapore ; Kazama, J. ; Min Zhang ; Tsuruoka, Y.
more authors

This paper proposes a method to improve the accuracy of bilingual texts (bitexts) dependency parsing by using an auto-generated bilingual treebank created with the help of statistical machine translation (SMT) systems. Previous bitext parsing methods use human-annotated bilingual treebanks that are costly and troublesome to obtain. In the proposed method, we use an auto-generated bilingual treebank to train the parsing models. First, an SMT system is used to translate a monolingual treebank into the target language; then, a monolingual parser for the target language is used to parse the translated sentences. Since the auto-translated sentences and auto-parsed trees in the auto-generated bilingual treebank are far from perfect, the bilingual constraints are not sufficiently reliable. To overcome this problem, we propose a method to verify the reliability of the constraints using a large amount of target monolingual and bilingual unannotated data. Finally, we design a set of effective bilingual features for parsing models on the basis of the verified constraints. We conduct the experiments using a standard test data. The experimental results show that our bitext parser significantly outperforms monolingual parsers. Moreover, our method is still able to provide improvement when we use a larger monolingual treebank containing over 50 000 sentences. We also test the proposed method with different SMT systems and the results show that our method is very robust to the noise. In particular, the proposed method can be used in a purely monolingual setting with the help of SMT. That is, it does not need the human translation of the test set as previous methods do.

Published in:
Audio, Speech, and Language Processing, IEEE Transactions on  (Volume:20 ,  Issue: 5 )

Date of Publication: July 2012

Need Help?


IEEE Advancing Technology for Humanity About IEEE Xplore | Contact | Help | Terms of Use | Nondiscrimination Policy | Site Map | Privacy & Opting Out of Cookies

A not-for-profit organization, IEEE is the world's largest professional association for the advancement of technology.
© Copyright 2013 IEEE - All rights reserved. Use of this web site signifies your agreement to the terms and conditions.