Loading [MathJax]/extensions/MathMenu.js
Developing a machine learning-based grade level classifier for Filipino children’s literature | IEEE Conference Publication | IEEE Xplore

Developing a machine learning-based grade level classifier for Filipino children’s literature


Abstract:

Reading is an essential part of children's learning. Identifying the proper readability level of reading materials will ensure effective comprehension. We present our eff...Show More

Abstract:

Reading is an essential part of children's learning. Identifying the proper readability level of reading materials will ensure effective comprehension. We present our efforts to develop a baseline model for automatically identifying the readability of children's and young adult's books written in Filipino using machine learning algorithms. For this study, we processed 258 picture books published by Adarna House Inc. In contrast to old readability formulas relying on static attributes like number of words, sentences, syllables, etc., other textual features were explored. Count vectors, Term FrequencyInverse Document Frequency (TF-IDF), n-grams, and character-level n-grams were extracted to train models using three major machine learning algorithms-Multinomial Naïve-Bayes, Random Forest, and K-Nearest Neighbors. A combination of K-Nearest Neighbors and Random Forest via voting-based classification mechanism resulted with the best performing model with a high average training accuracy and validation accuracy of 0.822 and 0.74 respectively. Analysis of the top 10 most useful features for each algorithm show that they share common similarity in identifying readability levels-the use of Filipino stop words. Performance of other classifiers and features were also explored.
Date of Conference: 15-17 November 2019
Date Added to IEEE Xplore: 19 March 2020
ISBN Information:
Conference Location: Shanghai, China

Contact IEEE to Subscribe

References

References is not available for this document.