A clustering-based approach to the separation of text from mixed text/graphics documents is presented. The approach starts from the grouping of connected components. Clustering is employed at three critical stages to improve the efficiency and effectiveness of the grouping, i.e., prior to the grouping, prior to orientation estimation, and posterior to the orientation estimation. Because of the high accuracy of the estimated orientation, not only the overgrouping but also most of undergrouping cases could be successfully handled
Published in:
Pattern Recognition, 1996., Proceedings of the 13th International Conference on
(Volume:3
)
Date of Conference: 25-29 Aug 1996