Skip to Main Content
Nowadays, people want to extract variety of information from on line texts. As more and more text becomes available on-line, there is emergent need for systems that extract information automatically from text corpus. One of the principle challenges of information extraction is the efficient customization of a system to a new domain. Adapting an information extraction system to a new domain entails the construction of a new set of extraction rules. Many recent information extraction systems have ignored the tedious and time-consuming nature of that process. This paper proposes an alternative approach, which generate candidate extraction rules from untagged text corpus using Link Grammar Parser and filter the final extraction rules using Wordnet and linguistic patterns. The proposed method not only reduces the amount of time and effort required to create an appropriate training corpus but also obviates the need to examine many candidate extraction rules so that the system can easily port well to different domain.
Date of Conference: 24-26 Jan. 2007