Skip to Main Content
Handwritten text line segmentation on real-world data presents significant challenges that cannot be overcome by any single technique. Given the diversity of approaches and the recent advances in ensemble-based combination for pattern recognition problems, it is possible to improve the segmentation performance by combining the outputs from different line finding methods. In this paper, we propose a novel graph clustering-based approach to combine the output of an ensemble of text line segmentation algorithms. A weighted undirected graph is constructed with nodes corresponding to connected components and edge connecting pairs of connected components. Text line segmentation is then posed as the problem of minimum cost partitioning of the nodes in the graph such that each cluster corresponds to a unique line in the document image. Experimental results on a challenging Arabic field dataset using the ensemble method shows a relative gain of 18% in the F1 score over the best individual method within the ensemble.