Skip to Main Content
A user query for information retrieval (IR) applications may not contain the most appropriate terms (words) as actually intended by the user. This is usually referred to as the term mismatch problem and is a crucial research issue in IR. Using the notion of relevance, we provide a comprehensive theoretical analysis of a parametric query vector, which is assumed to represent the information needs of the user. A lexical association function has been derived analytically using the system relevance criteria. The derivation is further justified using an empirical evidence from the user relevance criteria. Such analytical derivation as presented in this paper provides a proper mathematical framework to the query expansion techniques, which have largely been heuristic in the existing literature. By using the generalized retrieval framework, the proposed query representation model is equally applicable to the vector space model (VSM), Okapi best matching 25 (Okapi BM25), and Language Model (LM). Experiments over various data sets from TREC show that the proposed query representation gives statistically significant improvements over the baseline Okapi BM25 and LM as well as other well-known global query expansion techniques. Empirical results along with the theoretical foundations of the query representation confirm that the proposed model extends the state of the art in global query expansion.
Knowledge and Data Engineering, IEEE Transactions on (Volume:24 , Issue: 12 )
Date of Publication: Dec. 2012