Skip to Main Content
Preprocessing in handwritten text OCR involves line, word and character segmentation. This paper deals with text line identification of handwritten Indian scripts, especially of Bangla, as well as English, Hindi, Malayalam, etc. Here, a new dual method based on interdependency between text-line and inter-line gap is proposed. The method draws curves simultaneously through the text and inter-line gap points found from strip-wise histogram peaks and inter-peak valleys. The curves start from left and move right while one type of points guides the curve of other type so that the curves do not intersect. Then these curves are allowed to iteratively evolve so that the text-line curves cross more character strokes while inter-line curves cross less character strokes and yet keep the curves as straight as possible. After several iterations, the curves stabilize and define the final text-lines and inter-line gaps. The approach works well on text of different scripts with various geometric layouts, including poetry.
Date of Conference: 26-29 July 2009