Abstract:
We study the task of document layout analysis based on two-dimensional context-free grammars. We first identify a subclass of the grammars sufficient for a document struc...Show MoreMetadata
Abstract:
We study the task of document layout analysis based on two-dimensional context-free grammars. We first identify a subclass of the grammars sufficient for a document structure description where productions follow a mechanism inducing regular languages in the case of one-dimensional productions. We then show that properties of such grammars can be conveniently utilized to implement a very fast top-down parser. Experimental results are reported for PDF documents, which are chosen as a test domain since we are motivated by a development of digital document access methods for people with disabilities in which a retrieval of structural information plays an important role.
Date of Conference: 09-15 November 2017
Date Added to IEEE Xplore: 29 January 2018
ISBN Information:
Electronic ISSN: 2379-2140