Skip to Main Content
There are two problems in using words to represent document contents and query in information retrieval: ambiguity and different words which represent the same concept. These problems can be addressed by using query expansion. We focused on analysing the implementation of query expansion, word sense disambiguation (WSD), iterated relevance feedback, and some retrieval variations to retrieval performance. In this paper, WSD is implemented in Lucene using query expansion with thesaurus and relevance feedback. Extended Lesk algorithm was re-implemented to disambiguate the query using WordNet. Expansion terms were limited up to 20 words chosen from expansion term candidates from disambiguated query's senses information, co-occurrence terms, and most frequent terms using Kullback-Leibler Distance. We iterated the process to find the best number of expansion iteration. We found that the method using WSD to query can extend search process time to 161 times longer at worst. Query expansion using disambiguated sense information did not affect the performance much while using information from relevance feedback did. This experiment provides better understanding of WSD in information retrieval system performance.
Date of Conference: 17-19 July 2011