Loading [MathJax]/extensions/MathZoom.js
Bangla Document Classification Based on Machine Learning and Explainable NLP | IEEE Conference Publication | IEEE Xplore

Bangla Document Classification Based on Machine Learning and Explainable NLP


Abstract:

Massive digital texts are now accessible, thanks to technological advancement. Any amount of disorganized writing is useless. A high-quality representative corpus of any ...Show More

Abstract:

Massive digital texts are now accessible, thanks to technological advancement. Any amount of disorganized writing is useless. A high-quality representative corpus of any particular language is essential for research in computational linguistics and natural language processing (NLP). Bangla NLP research is still in its infancy because of the dearth of high-quality public corpus. This paper proposed a newly produced corpus consists of 1,30,307 documents covering 10 categories collected from 11 websites, having 2,94,80,828 tokens and 17,59,085 unique tokens. Seven supervised machine learning methods are explored in this work. Furthermore, Local Interpretable Model-agnostic Explanations (LIME) and SHapley Additive explanations (SHAP) are also examined to explain about different model performance. The obtained results show that the Random Forest (RF), Decision Tree (DT) and Support Vector Machine (SVM) outperform other models. RF classifier achieves the highest accuracy 99.91% which is better than the existing state-of-the-art methods.
Date of Conference: 07-09 December 2023
Date Added to IEEE Xplore: 13 February 2024
ISBN Information:
Conference Location: Khulna, Bangladesh

Contact IEEE to Subscribe

References

References is not available for this document.