Skip to Main Content
Semantic smoothing, which incorporates synonym and sense information into the language models, is effective and potentially significant to improve retrieval performance. Previously implemented semantic smoothing models such as the translation model have shown good experimental results. However, these models are unable to incorporate contextual information. To overcome this limitation, we propose a novel context-sensitive semantic smoothing method that decomposes a document into a set of weighted context-sensitive topic signatures and then maps those topic signatures into query terms. The language model with such a context- sensitive semantic smoothing is referred to as the topic signature language model. In detail, we implement two types of topic signatures, depending on whether ontology exists in the application domain. One is the ontology-based concept and the other is the multiword phrase. The mapping probabilities from each topic signature to individual terms are estimated through the EM algorithm. Document models based on topic signature mapping are then derived. The new smoothing method is evaluated on the TREC 2004/ 2005 Genomics Track with ontology-based concepts, as well as the TREC Ad Hoc Track (Disks 1, 2, and 3) with multiword phrases. Both experiments show significant improvements over the two-stage language model, as well as the language model with context- insensitive semantic smoothing.