Skip to Main Content
Most of the common techniques in text retrieval are based on the statistical analysis of a term either as a word or a phrase. Statistical analysis of a term frequency captures the importance of the term within a document only. Thus, to achieve a more accurate analysis, the underlying representation should indicate terms that capture the semantics of text. In this case, the representation can capture terms that present the concepts of the sentence, which leads to discover the topic of the document. A new concept-based representation, called Conceptual Ontological Graph (COG), where a concept can be either a word or a phrase and totally dependent on the sentence semantics, is introduced. The aim of the proposed representation is to extract the most important terms in a sentence and a document with respect to the meaning of the text. The COG representation analyzes each term at both the sentence and the document levels. This is different from the classical approach of analyzing terms at the document level. First, the proposed representation denotes the terms which contribute to the sentence semantics. Then, each term is chosen based on its position within the COG representation. Lastly, the selected terms are associated to their documents as features for the purpose of indexing before text retrieval. The COG representation can effectively discriminate between non-important terms with respect to sentence semantics and terms which hold the key concepts that represent the sentence meaning. Large sets of experiments using the proposed COG representation on different datasets in text retrieval are conducted. Experimental results demonstrate the substantial enhancement of the text retrieval quality using the COG representation over the traditional techniques. The evaluation of results relies on two quality measures, the bpref and P(10). Both the quality measures improved when the newly developed COG representation is used to enhance the quality of the text retrieval results.