Loading [a11y]/accessibility-menu.js
Robust Text Line Segmentation for Historical Manuscript Images Using Color and Texture | IEEE Conference Publication | IEEE Xplore

Robust Text Line Segmentation for Historical Manuscript Images Using Color and Texture


Abstract:

In this paper we present a novel text line segmentation method for historical manuscript images. We use a pyramidal approach where at the first level, pixels are classifi...Show More

Abstract:

In this paper we present a novel text line segmentation method for historical manuscript images. We use a pyramidal approach where at the first level, pixels are classified into: text, background, decoration, and out of page, at the second level, text regions are split into text line and non text line. Color and texture features based on Local Binary Patterns and Gabor Dominant Orientation are used for classification. By applying a modified Fast Correlation-Based Filter feature selection algorithm, redundant and irrelevant features are removed. Finally, the text line segmentation results are refined by a smoothing post-processing procedure. Unlike other projection profile or connected components methods, the proposed algorithm does not use any script-specific knowledge and is applicable to color images. The proposed algorithm is evaluated on three historical manuscript image datasets of diverse nature and achieved an average precision of 91% and recall of 84%. Experiments also show that the proposed algorithm is robust with respect to changes of the writing style, page layout, and noise on the image.
Date of Conference: 24-28 August 2014
Date Added to IEEE Xplore: 06 December 2014
Electronic ISBN:978-1-4799-5209-0
Print ISSN: 1051-4651
Conference Location: Stockholm, Sweden

I. Introduction

Nowadays, a large number of historical documents have been digitized and made available to the public. With the increasing availability of computers and text-based software, the analysis of such documents is leveraged to higher dimensions leading to novel interests in digital humanities research. Historical Document Image Analysis and Recognition (HDIAR) methodologies are now widely used to enable computers to recognize the text content of documents. The vision of HDIAR is the automatic extraction of the information contained in a document; this includes the actual textual (and pictorial) information as well as writer identification, word spotting, or meta data.

Contact IEEE to Subscribe

References

References is not available for this document.