By Topic

Using the Web corpus to translate the queries in cross-lingual information retrieval

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

3 Author(s)
Junlin Zhang ; Open Syst. & Chinese Inf. Process. Center, Chinese Acad. of Sci., Beijing, China ; Le Sun ; Jinming Min

Accurate cross-language information retrieval requires that query terms be correctly translated. In this paper, we propose a new method for Web corpus based query translation, which contains two steps: (1) translation candidate extraction and (2) translation selection. In translation candidate extraction, we use the search engine to find out the corpus data in the target language on the Web by submitting the query in source language. The candidate translations are expected to be both in the title and query-biased summary of searched document. Then we find the intersection substrings of different title pairs (or title-summary pairs) to fix down the possible translation. In translation selection, we determine the possible translation(s) from the candidates by combining substring frequency, inverse translation frequency and top result preferred factor to design the ranking function. Experimental results indicate that the top 3 inclusion rate of translation is 75.57% and our method is also very effective in CLIR task.

Published in:

Natural Language Processing and Knowledge Engineering, 2005. IEEE NLP-KE '05. Proceedings of 2005 IEEE International Conference on

Date of Conference:

30 Oct.-1 Nov. 2005