Skip to Main Content
The World Wide Web has provided the facility of bringing information to the fingertips of its users. Since most of the documents available on the web are machine-readable but not machine-understandable, ensuring the retrieval of relevant information continues to be a difficult task. In the traditional text representation approach, high frequency keywords are used as term representatives of text. However, the main drawbacks in this approach are lack of direct relationship between word frequency and its importance, and the effect of the word ambiguities. Considering these shortcomings of the keyword-based method, this paper presents a phrase-based text representation approach that uses rule-based natural language processing (NLP) techniques. Extraction of key-phrases from text documents is based on a process of partial parsing. By making the indexing terms more meaningful through reduction of the ambiguity in words considered in isolation, improvement in retrieval effectiveness is sought to be achieved.