Abstract:
Stemming is a preprocessing task for Natural Language Processing(NLP) that involves normalizing inflected words representing the same concept of the original word. Steami...Show MoreMetadata
Abstract:
Stemming is a preprocessing task for Natural Language Processing(NLP) that involves normalizing inflected words representing the same concept of the original word. Steaming is a process of text normalization that transforms a slated word into its root form. It has a great impact in different applications in NLP and Natural Language Understanding. One of the biggest challenges in Bangla stemming is to collect the rules that are associated with Bangla word stemming and standard dataset for testing the accuracy. This paper presents a summery of rules associated with Bangla stemming and a gold standard dataset corpus which will helps to testify the stemming algorithm. We verify the rules and using our rules and corpus we tested with existing Bangla stemming algorithms and found that the proposed corpus makes a difference with existing techniques.
Published in: 2021 5th International Conference on Electrical Information and Communication Technology (EICT)
Date of Conference: 17-19 December 2021
Date Added to IEEE Xplore: 16 March 2022
ISBN Information: