Abstract:
The field of Artificial Intelligence (AI) and HumanComputer Interaction (HCI) has grown significantly in the last decade. Speech Recognition (SR) and more specifically Sp...Show MoreMetadata
Abstract:
The field of Artificial Intelligence (AI) and HumanComputer Interaction (HCI) has grown significantly in the last decade. Speech Recognition (SR) and more specifically Speech Emotion Recognition (SER) is still a growing field with quite a few academic and private companies doing research. Currently, SER is not specifically geared toward African-based languages. The paper is to show how to create an Afrikaans-based speech corpora to train a Neural Network (NN). Method-wise, speech samples are extracted from streamed broadcasts. A local Afrikaans Youtube channel is used. Care is taken that the ‘‘Creative Commons Attribution license (reuse allowed)’’ is always adhered to. In cases where the creative commons license is not available, authorization has been obtained. The speech clips are saved in.wav format. The emotions captured are Anger, Anticipation, Disgust, Joy, Sadness, Suprise, Fear and Trust. All data is anonymized. The recorded clips are verified by a second independent party and if required verified again by another. This makes sure that categorization is correct. The result is an Afrikaans speech corpus with roughly 800 speech clips. Finally, LTSM is applied to the dataset, and the new Afrikaans corpora yielded a detection accuracy of 58% and 74% with transfer learning.
Published in: 2022 2nd International Conference on Robotics, Automation and Artificial Intelligence (RAAI)
Date of Conference: 09-11 December 2022
Date Added to IEEE Xplore: 10 April 2023
ISBN Information: