Abstract:
Automated Essay Scoring (AES) is a task that deals with grading written essays automatically without human intervention. This study compares the performance of three AES ...Show MoreMetadata
Abstract:
Automated Essay Scoring (AES) is a task that deals with grading written essays automatically without human intervention. This study compares the performance of three AES models which utilize different text embedding methods, namely Global Vectors for Word Representation (GloVe), Embeddings from Language Models (ELMo), and Bidirectional Encoder Representations from Transformers (BERT). We used two evaluation metrics: Quadratic Weighted Kappa (QWK) and a novel "robustness", which quantifies the models' ability to detect adversarial essays created by modifying normal essays to cause them to be less coherent. We found that: (1) the BERT-based model achieved the greatest robustness, followed by the GloVe-based and ELMo-based models, respectively, and (2) fine-tuning the embeddings improves QWK but lowers robustness. These findings could be informative on how to choose, and whether to fine-tune, an appropriate model based on how much the AES program places emphasis on proper grading of adversarial essays.
Published in: 2020 IEEE REGION 10 CONFERENCE (TENCON)
Date of Conference: 16-19 November 2020
Date Added to IEEE Xplore: 22 December 2020
ISBN Information: