Abstract:
We consider the Document Understanding problem of segmenting tables in rows. We propose a method that first enumerates virtual row separator candidates and then select th...Show MoreMetadata
Abstract:
We consider the Document Understanding problem of segmenting tables in rows. We propose a method that first enumerates virtual row separator candidates and then select the correct ones thanks to a classification task, solved using supervised structured machine learning. Interestingly, the task is the joint-classification of virtual separators and real text lines. We describe and tested several alternative candidate generation methods and report the results of our experiment for each, on two different types of registry books from the 19th century.
Date of Conference: 20-25 September 2019
Date Added to IEEE Xplore: 03 February 2020
ISBN Information: