Skip to Main Content
The OCR is an electronic conversion of scanned images of handwritten, typewritten or printed text into machine-encoded text. The Optical Character System is available for various languages, such as English, Chinese and Arabic script, but it is commercially not available for Odia script. We have taken a step to develop OCR system for Odia language. The OCR is popular for its various applications potentials in banks, library automation, post-offices, defense organizations and language processing. Line and Word segmentation is one of the important steps of OCR system. The accuracy of the word/character recognition is directly affected by the correctness/ incorrectness of text-line and word segmentation. In this paper we have proposed a robust method for segmentation of individual text lines of Odia printed document image file. The segmented text line is the input for the word segmentation method which produces segmented words. Both foreground and background information are used in the proposed method. We have tested our method on scanned Odia scripts as well as some multi-script documents and obtained encouraging result. This technique is based on the intensities of pixels in the document.
Date of Conference: 26-28 July 2012