Abstract:
Emotion Recognition is a gravitating concept in Speech Data Analysis due to its adaptability in identifying human emotion. Emotion is an inherent human trait exhibited th...Show MoreMetadata
Abstract:
Emotion Recognition is a gravitating concept in Speech Data Analysis due to its adaptability in identifying human emotion. Emotion is an inherent human trait exhibited through attitude, behaviors, words, speech, and gestures. Speech Emotion Detection refers to identifying an individual's conscious or unconscious mental and psychological state, like anger, fear, happiness, etc., as a trigger to their surroundings. Our research pivots around distinguishing emotions such as anger, happiness, sadness, fear, disgust, sur-prise, and neutrality by extracting unique features from raw Bangla audio recordings. We use the assembled features to integrate three boosting classifiers namely, AdaBoost, Gradient Boost, and XGBoost. Finally, we use the data to train an MLP, a CNN, and a CNN-LSTM-based model. Collating the outcome gained from the two Bangla SER datasets, SUBESCO and KBES, we record a whopping accuracy of 94.32% by CNN-LSTM and 73.7% by XGBoost. A quantitative comparison with the current state-of-the-art results for KBES proves our scores as the best. By drawing a qualitative comparison between the vast studies performed with the SUBESCO dataset, we can suggest that our CNN-LSTM model provides the best results in terms of complexity and performance trade-off.
Date of Conference: 23-24 October 2024
Date Added to IEEE Xplore: 25 November 2024
ISBN Information: