Skip to Main Content
This paper proposes a model-based text line segmentation algorithm for machine-printed document images. The model is based on geometric configuration which uses the interline spaces rather than the text lines. The paper proposes an objective function whose maximization leads to the optimal solution. The proposed interline space model provides the primary advantage of script-free nature. Additionally the model is versatile due to its abilities of processing both horizontally and vertically written documents and inferring the semantic of reading order. The experiments performed with various document images in Latin, Korean, Chinese, and Japanese scripts have proven the aforementioned advantages and have shown the noise tolerance.