Skip to Main Content
A web-based system for retrieving imaged documents from a digital library is described in this paper. First, some image preprocessing is performed off-line on the underlying imaged document to extract its word objects. Then, each word object is represented by a string known as its feature code, based on which a feature code file of the corresponding document is constructed. On the web interface side, the system allows the user to input a set of query words and indicate either to perform "AND" or "OR" operation on them. Once receiving user's request, the system will process each query word and combine the results based on the "AND" or "OR" operation the user has chosen. As for each query word, it is first looked up in an index table that stores words being queried before. If matches are found, results will be retrieved from the index table directly and stored temporarily for subsequent merging. This speeds up searching and makes the system an incremental intelligence system. Otherwise, the system will convert the query word to a feature code string and employ a partial word matching approach to perform search on the pre-generated feature code files. Preliminary experimental results with the imaged documents of students' theses provided by our digital library show that the proposed system is efficient and promising for document image retrieval, and thus has potential applications to digital libraries.