Page segmentation and image content classification is an important step for automatic document image processing including mixed-type document image compression, form and check reading, and mail sorting. The authors first propose an enhanced background thinning based page segmentation approach. They then present a hierarchical approach for the classification of the segmented sub-images into one of two categories: text and picture. The approach combines a cross-correlation method, the Kolmogorov complexity measure (A.N. Kolmogorov, 1965), and a neural network classifier in order to achieve both efficiency and high accuracy. Our approach has been tested on a number of mixed-type document images with good results
Published in:
Intelligent Multimedia, Video and Speech Processing, 2001. Proceedings of 2001 International Symposium on
Date of Conference: 2001