Abstract:
In the field of document image processing and analysis, the task of document classification holds significant importance, where the goal is to automatically classify docu...Show MoreMetadata
Abstract:
In the field of document image processing and analysis, the task of document classification holds significant importance, where the goal is to automatically classify documents based on their content and structure. In this work, a proposal has been made towards a novel approach to classifying document images using logistic regression into structured or unstructured classes. This approach involves extracting various features from document images, including Haralick, Brisque, and local binary patterns, and modelling the relationship between these features and the document class using logistic regression. The performance of Logistic Regression is compared with that of multiple linear regressions, which are commonly used as a baseline model for document classification. These experiments show that logistic regression outperforms multiple linear regressions in terms of classification accuracy and provides more interpretable results. This approach was evaluated on a dataset of nine classes from the RVL-CDIP dataset. The results demonstrate that this approach achieves an accuracy of 94%, which is significantly higher than the accuracy of the multiple linear regression models, which achieved an accuracy of 85%.
Published in: 2023 Second International Conference on Augmented Intelligence and Sustainable Systems (ICAISS)
Date of Conference: 23-25 August 2023
Date Added to IEEE Xplore: 22 September 2023
ISBN Information: