By Topic

Automatic Construction of English-Vietnamese Parallel Corpus through Web Mining

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

2 Author(s)
Dang, V.B. ; Fac. of Inf. Technol., Univ. of Natural Sci., Ho Chi Minh City ; Bao-Quoc Ho

Parallel corpus has become a very essential resource for multilingual natural language processing and there are large scale of parallel texts available on the Internet these days. In this paper, we propose a simple but reliable method to construct an English-Vietnamese parallel corpus through Web mining. Our system can automatically download and detect parallel Web pages on a given domain to construct a parallel corpus that is well-aligned at paragraph level with completely clean texts. The proposed technique can be easily applied to other language pairs. Experiments have been made and shown promising results.

Published in:

Research, Innovation and Vision for the Future, 2007 IEEE International Conference on

Date of Conference:

5-9 March 2007