A Comparative Study on Document Images Classification using Logistic Regression and Multiple Linear Regressions | IEEE Conference Publication | IEEE Xplore

A Comparative Study on Document Images Classification using Logistic Regression and Multiple Linear Regressions


Abstract:

In the field of document image processing and analysis, the task of document classification holds significant importance, where the goal is to automatically classify docu...Show More

Abstract:

In the field of document image processing and analysis, the task of document classification holds significant importance, where the goal is to automatically classify documents based on their content and structure. In this work, a proposal has been made towards a novel approach to classifying document images using logistic regression into structured or unstructured classes. This approach involves extracting various features from document images, including Haralick, Brisque, and local binary patterns, and modelling the relationship between these features and the document class using logistic regression. The performance of Logistic Regression is compared with that of multiple linear regressions, which are commonly used as a baseline model for document classification. These experiments show that logistic regression outperforms multiple linear regressions in terms of classification accuracy and provides more interpretable results. This approach was evaluated on a dataset of nine classes from the RVL-CDIP dataset. The results demonstrate that this approach achieves an accuracy of 94%, which is significantly higher than the accuracy of the multiple linear regression models, which achieved an accuracy of 85%.
Date of Conference: 23-25 August 2023
Date Added to IEEE Xplore: 22 September 2023
ISBN Information:
Conference Location: Trichy, India

Contact IEEE to Subscribe

References

References is not available for this document.