Abstract:
This paper describes a method for extracting relevant tokens of entity from semi-structured administrative documents. This method is used for mislabeling correction by em...Show MoreMetadata
Abstract:
This paper describes a method for extracting relevant tokens of entity from semi-structured administrative documents. This method is used for mislabeling correction by employing the entity tokens physically close in a document. Firstly, the entities are labeled. Secondly, each entity is modeled by a tokens structure graph in which the nodes represent the tokens and the arcs represent the distances. A clustering algorithm is then applied to incrementally concatenate the relevant tokens of entities and ignore the noisy parts. The obtained results with a dataset of real invoices are reported in experimental section.
Date of Conference: 09-12 October 2016
Date Added to IEEE Xplore: 09 February 2017
ISBN Information: