Abstract:
The problem of automatic table detection has always been a great topic of debate in the field of Document Analysis and Recognition (DAR). Digital documents are efficient ...Show MoreMetadata
Abstract:
The problem of automatic table detection has always been a great topic of debate in the field of Document Analysis and Recognition (DAR). Digital documents are efficient than their printed counterparts for storage, maintenance and republishing. Being a non-textual object of a document, tables prevent OCR system to digitize a document perfectly and distorts layout and structure of digitized documents. There is no available algorithm or method which solves this problem for all possible types of tables. This paper tackles the problem of table detection and retention by proposing a bi-modular approach based on structural information of tables. This structural information includes bounding lines, row/column separators and space between columns. Through analysis of these properties, our experiments on a dataset of above 600 images consisting of more than 829 tables have detected 90% of the table correctly.
Date of Conference: 21-23 December 2017
Date Added to IEEE Xplore: 12 March 2018
ISBN Information: