The vector space model is one of the most common information retrieval (IR) methods for text document search. The cosine of the angle or the Euclidean distance between the query vector and each document vector is commonly used to measure similarity for query matching. Even though the vector space model starts with a term-by-document matrix, it inevitably loses the information of relations between query terms in the document in the first place. This paper presents a modified vector space model for measuring similarity between the query and the document when responding to a multi-term query. More weight is assigned to the keywords based on the adjacency between the terms in the documents. Thus, when a document contains the adjacency terms, its vector will typically move closer to the query vector to show stronger relevancy between query and the document.
Published in:
Computational Intelligence and Data Mining, 2009. CIDM '09. IEEE Symposium on
Date of Conference: March 30 2009-April 2 2009