Abstract:
Document analysis has been a longstanding topic of research because of its great impact on a wide range of practical applications. The sheer number of different document ...Show MoreMetadata
Abstract:
Document analysis has been a longstanding topic of research because of its great impact on a wide range of practical applications. The sheer number of different document formats and domains make this task particularly difficult. The existing solutions propose parsing documents of various formats, still there exists the problem of structure extraction for different document types. Dedoc is an open-source library that allows to extract document content in a unified representation and make it structuralized according to the specific domain of the document. The system architecture provides its scalability, i.e. the possibility to add new document formats and structure types handlers.
Published in: 2023 Ivannikov Ispras Open Conference (ISPRAS)
Date of Conference: 04-05 December 2023
Date Added to IEEE Xplore: 29 April 2024
ISBN Information: