Conferences >2021 5th International Confer...

Gold Dataset for the Evaluation of Bangla Stemmer

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

Stemming is a preprocessing task for Natural Language Processing(NLP) that involves normalizing inflected words representing the same concept of the original word. Steami...Show More

Metadata

Abstract:

Stemming is a preprocessing task for Natural Language Processing(NLP) that involves normalizing inflected words representing the same concept of the original word. Steaming is a process of text normalization that transforms a slated word into its root form. It has a great impact in different applications in NLP and Natural Language Understanding. One of the biggest challenges in Bangla stemming is to collect the rules that are associated with Bangla word stemming and standard dataset for testing the accuracy. This paper presents a summery of rules associated with Bangla stemming and a gold standard dataset corpus which will helps to testify the stemming algorithm. We verify the rules and using our rules and corpus we tested with existing Bangla stemming algorithms and found that the proposed corpus makes a difference with existing techniques.

Published in: 2021 5th International Conference on Electrical Information and Communication Technology (EICT)

Date of Conference: 17-19 December 2021

Date Added to IEEE Xplore: 16 March 2022

ISBN Information:

DOI: 10.1109/EICT54103.2021.9733662

Conference Location: Khulna, Bangladesh