Loading [MathJax]/extensions/MathMenu.js
Gold Dataset for the Evaluation of Bangla Stemmer | IEEE Conference Publication | IEEE Xplore

Gold Dataset for the Evaluation of Bangla Stemmer


Abstract:

Stemming is a preprocessing task for Natural Language Processing(NLP) that involves normalizing inflected words representing the same concept of the original word. Steami...Show More

Abstract:

Stemming is a preprocessing task for Natural Language Processing(NLP) that involves normalizing inflected words representing the same concept of the original word. Steaming is a process of text normalization that transforms a slated word into its root form. It has a great impact in different applications in NLP and Natural Language Understanding. One of the biggest challenges in Bangla stemming is to collect the rules that are associated with Bangla word stemming and standard dataset for testing the accuracy. This paper presents a summery of rules associated with Bangla stemming and a gold standard dataset corpus which will helps to testify the stemming algorithm. We verify the rules and using our rules and corpus we tested with existing Bangla stemming algorithms and found that the proposed corpus makes a difference with existing techniques.
Date of Conference: 17-19 December 2021
Date Added to IEEE Xplore: 16 March 2022
ISBN Information:
Conference Location: Khulna, Bangladesh

Contact IEEE to Subscribe

References

References is not available for this document.