Abstract:
Machine Reading Comprehension (MRC) is a cru-cial task in natural language processing that enables machines to extract the minimal text span from a reading passage that a...Show MoreMetadata
Abstract:
Machine Reading Comprehension (MRC) is a cru-cial task in natural language processing that enables machines to extract the minimal text span from a reading passage that answers a given question. Its importance is underscored in elevating the machine's capabilities to understand and process human language, which is crucial for various applications, from digital assistants to information retrieval systems. The availability of annotated datasets like SQuAD and the improvements in modeling techniques have recently driven significant progress in MRC. However, a critical limitation of traditional MRC approaches, particularly for non-English languages, is the as-sumption that every question has a corresponding answer in the reading passage. This assumption does not align with real-world scenarios where some questions are unanswerable using the given text, although some answers appear plausible. In this study, we evaluate the performance of several transformer-based models for Arabic MRC tasks that include both answerable and unan-swerable questions. These models are fine-tuned using an Arabic-translated version of the SQuADv2.0 dataset. Our investigation features a detailed analysis of the results, categorizing them by question types and tools. Notably, AraBERTv2.0-base stands out as the most effective model, achieving an exact match score of 84.20% and an F1-score of 82.5 %. The results also highlight AraBERTv2.0-base's distinct strength in accurately responding to answerable questions, which sets it apart from other models that are better at identifying unanswerable questions.
Published in: 2023 Tenth International Conference on Social Networks Analysis, Management and Security (SNAMS)
Date of Conference: 21-24 November 2023
Date Added to IEEE Xplore: 02 January 2024
ISBN Information: