Evaluation of Different Classifiers for Sinhala POS Tagging | IEEE Conference Publication | IEEE Xplore

Evaluation of Different Classifiers for Sinhala POS Tagging


Abstract:

This paper presents a comparative evaluation of three state-of-the-art classifiers for Sinhala Parts-of-Speech (POS) tagging. Support Vector Machines (SVM), Hidden Markov...Show More

Abstract:

This paper presents a comparative evaluation of three state-of-the-art classifiers for Sinhala Parts-of-Speech (POS) tagging. Support Vector Machines (SVM), Hidden Markov Models (HMM) and Conditional Random Fields (CRF) based POS tagger models are generated and tested using different combinations of a corpus of news articles and a corpus of official government documents. CRF is used for the first time in Sinhala POS tagging, thus the best feature set is experimentally derived. To further improve the accuracy of POS tagging, a majority voting based ensemble tagger is created using three individual taggers. This ensemble tagger achieved the highest accuracy in POS tagging than any individual tagger. The two domains (news, and official government documents) used in this study have noticeable differences in writing style and vocabulary. Generating domain specific POS taggers is time consuming and costly due to the overhead involved in creating and manually tagging domain specific corpora, for low resourced languages in particular. Therefore, this study also evaluates the possibility and successfulness of using corpora of different domains in training and testing phases of aforementioned machine learning techniques.
Date of Conference: 30 May 2018 - 01 June 2018
Date Added to IEEE Xplore: 30 July 2018
ISBN Information:
Conference Location: Moratuwa, Sri Lanka

References

References is not available for this document.