Skip to Main Content
The Mixed Raster Content (MRC) standard (ITU-T T.44) specifies a framework for document compression which can dramatically improve the compression/quality tradeoff as compared to traditional lossy image compression algorithms. The key to MRC's performance is the separation of the document into foreground and background layers, represented as a binary mask. In this paper, we propose a novel multiscale segmentation scheme based on the sequential application of two algorithms. The first algorithm, Cost Optimized Segmentation (COS), is a blockwise segmentation algorithm formulated in a global cost optimization framework. The second algorithm, Connected Component Classification (CCC), refines the initial segmentation by classifying feature vectors of connected components using a Markov random field (MRF) model. The combined COS/CCC segmentation algorithms are then incorporated into a multiscale framework in order to improve the segmentation accuracy of text with varying size.