By Topic

Cross language information retrieval based on LDA

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

3 Author(s)
Ai Wang ; Key Lab. of Complex Syst. & Intell. Sci., Chinese Acad. of Sci., Beijing, China ; YaoDong Li ; Wei Wang

This paper proposed a LDA-based cross-language retrieval model that did not rely on word-by-word translation of query or document. Instead, a parallel corpus was used to estimate a cross-language LDA (Latent Dirichlet Allocation) model. We assumed that a topic variable Z in LDA could generate both an English token and a Chinese token, given that the parallel corpus contained two languages: English and Chinese. Therefore, the LDA model was easy to be extended to multi-language information retrieval as long as a multi-lingual parallel corpus was provided. The proposed LDA-based crosslanguage retrieval model was compared with three popular retrieval models: LDA-based mono-lingual document model; Mono-lingual TF.IDF retrieval model; Cross-lingual Latent Semantic Indexing retrieval model on CNKI datasets. Experimental results showed that this model was very effective and achieved very good performance.

Published in:

Intelligent Computing and Intelligent Systems, 2009. ICIS 2009. IEEE International Conference on  (Volume:3 )

Date of Conference:

20-22 Nov. 2009