Abstract:
The paper's topic is ‘Artificial intelligence in sociolinguistics’. This paper presents an investigation to build a deep learning model for Arabic Dialect Identification ...Show MoreMetadata
Abstract:
The paper's topic is ‘Artificial intelligence in sociolinguistics’. This paper presents an investigation to build a deep learning model for Arabic Dialect Identification from speech. This aims at establishing an effective system that had the potential of capturing distinctive Arabic dialects from audio samples of 19 countries. This endeavor is crucial because of the complex linguistic diversity that is characteristic of the Arabic language, causing significant problems in dialect identification and processing. A model is created with fine-tuning. Over 66,000 audio files from YouTube taken from the ADI-17 MIT dataset and the our own collection for Bahrain and Tunisia forms this dataset which is used to train the model. The test dataset is comprised of over 15,000 audio files from the ADI-17 dataset and our collection. The large scale and diversity of the dataset ensured that the test conditions are extensive. The fine-tuned classification model reports an accuracy score of 86%. However, the model reaches an accuracy score of 95% when considering the top 3 countries. Error analysis have shown that the task of identifying the Arabic dialect is challenging even for human testers. Such results are very promising to produce a robust system for Arabic Dialect Identification from speech.
Date of Conference: 24-25 April 2024
Date Added to IEEE Xplore: 26 July 2024
ISBN Information: