Skip to Main Content
Nowadays, users of computers store a lot of text documents. This requires fast and precise searches over documents. The goal of Information Retrieval (IR) models is to provide users with those documents that will satisfy their information needs. The core of such models is the document representation used in the indexing of documents. Traditional IR models handle the frequency of query terms. The disadvantage of these models is that they exclusively consider terms in the query and ignore similar terms. This paper proposes a topic based indexing approach to represent topics associated to documents. Documents are modeled by using clustering algorithms based on natural language processing. As result of this proposal is a document-topic matrix representation denoting the importance of topics inside documents. In a similar way, each query over documents is converted into a vector of topics. Thus, a similarity measure can be applied over this vector and the matrix of documents to retrieve the most relevant documents.