Abstract:
We study the task of document layout analysis based on two-dimensional context-free grammars. We first identify a subclass of the grammars sufficient for a document struc...Show MoreMetadata
Abstract:
We study the task of document layout analysis based on two-dimensional context-free grammars. We first identify a subclass of the grammars sufficient for a document structure description where productions follow a mechanism inducing regular languages in the case of one-dimensional productions. We then show that properties of such grammars can be conveniently utilized to implement a very fast top-down parser. Experimental results are reported for PDF documents, which are chosen as a test domain since we are motivated by a development of digital document access methods for people with disabilities in which a retrieval of structural information plays an important role.
Date of Conference: 09-15 November 2017
Date Added to IEEE Xplore: 29 January 2018
ISBN Information:
Electronic ISSN: 2379-2140
Faculty of Electrical Engineering, Czech Technical University Karlovo náměstí 13, Czech Republic
Faculty of Engineering, Ibaraki University, Hitachi, Ibaraki, Japan
Faculty of Electrical Engineering, Czech Technical University Karlovo náměstí 13, Czech Republic
Faculty of Engineering, Ibaraki University, Hitachi, Ibaraki, Japan