I. Introduction
In an attempt to move towards paperless office, a large number of printed documents are being digitized and archived in different image databases, digital libraries, and internet applications with an intention of preserving these documents for long term use, to serve large groups of people, and also to felicitate e-governance applications [3]. Such a huge collection of document images poses a challenge when searching for relevant documents, and moreover searching is an important and frequently used operation. Since the archived documents are in the image form, the existing text processing (searching) algorithms fail to operate over them, and this necessitates the providing of a facility to search the relevant documents in the image form itself [3], [4]. In the literature, two important techniques have been proposed to address this issue; these are based on the concepts of Digital Image Processing (DIP) and Document Image Retrieval (DIR) [5]. The first approach relies on the usage of digital image processing techniques that analyze the text areas in the document image and convert them into machine readable ASCII text, thus making the text searchable using simple text processing algorithms. The DIP techniques employ suitable text segmentation algorithms and subsequently use Optical Character Recognition (OCR) to bring the text contents into an editable ASCII form. However, OCR based techniques are very sensitive to the noise and degradation present in the document image [3], [4], and the performance of the OCR depends largely on the quality of the input image and the segmentation algorithm that is applied. Because of these limitations, a new technique of DIR known as keyword/word spotting was introduced by [6], and later improved by many researchers as reported in [5]. The keyword spotting technique is an OCR-less approach for locating the specified keywords in the document image which works on the principle of image matching [6], [5].