Skip to Main Content
It has been shown that annotating prominent text patterns contained in documents with appropriate types may benefit many applications. Most conventional tools for automatic text annotation extract named entities from texts and annotate them with information about persons, locations, dates and so on. However, this kind of entity type information is often short in length and is mostly limited to a small set of broader categories. In this paper, we try to remedy this problem by presenting an approach to extract global evidences from documents for improved named entity recognition. We also propose an unsupervised, generalized classification approach that collects training data from the Web automatically and classifies text patterns into more refined categories. Experimental results show the feasibility of the proposed approaches for search on the data of the NTCIR-2 information retrieval task.
Date of Conference: 19-22 Sept. 2005