Abstract:
Geez is an ancient language used in Ethiopia and Eritrea as a liturgical language and it has also gained attention in academic areas in Ethiopia and abroad. Despite its h...Show MoreMetadata
Abstract:
Geez is an ancient language used in Ethiopia and Eritrea as a liturgical language and it has also gained attention in academic areas in Ethiopia and abroad. Despite its historical significance, there is a lack of computational resources for natural language processing tasks in Geez. To address this issue, this study develops a deep learning-based Geez Part-of-Speech (POS) tagger model. POS tagging is the process of labeling words in a text according to their grammatical category. A manually annotated dataset of 4981 sentences containing 30K words and 11K unique words is collected and used for training and evaluation. The dataset undergoes preprocessing techniques such as tokenization, sequencing, and sequence padding. Two experiments are conducted using LSTM, BiLSTM, GRU, and BiGRU deep learning models. The results show that the BiLSTM model achieves higher performance, with an accuracy of 94.5% in the 70-15-15 splitting and 95.01% in the 80-10-10 splitting. These experimental findings suggest that deep learning models have the potential to identify the part of speech of the Geez language. Consequently, they can be used in the development of natural language processing tools and resources for low-resource languages. Future studies may explore more sophisticated architectures and techniques to further enhance the model's accuracy on complex and diverse datasets.
Published in: 2023 International Conference on Information and Communication Technology for Development for Africa (ICT4DA)
Date of Conference: 26-28 October 2023
Date Added to IEEE Xplore: 06 November 2023
ISBN Information:
Related Articles are not available for this document.