ScSer: Supervised Contrastive Learning for Speech Emotion Recognition using Transformers | IEEE Conference Publication | IEEE Xplore

ScSer: Supervised Contrastive Learning for Speech Emotion Recognition using Transformers


Abstract:

Emotion recognition from the speech is a key challenging task and an active area of research in effective Human-Computer Interaction (HCI). Though many deep learning and ...Show More

Abstract:

Emotion recognition from the speech is a key challenging task and an active area of research in effective Human-Computer Interaction (HCI). Though many deep learning and machine learning approaches have been proposed to tackle the problem, they lack in both accuracy and learning robust representations agnostic to changes in voice. Additionally, there is a lack of sufficient labelled speech data for bigger models. To overcome these issues, we propose supervised contrastive learning with transformers for the task of speech emotion recognition (ScSer) and evaluate it on different standard datasets. Further, we experiment the supervised contrastive setting with different augmentations from WavAugment library and some custom augmentations. Finally, we propose a custom augmentation random cyclic shift with which ScSer outperforms other competitive methods and produce a state of the art accuracy of 96% on RAVDESS dataset with 7600 samples (Big-Ravdess) and a 2-4% boost over other wav2vec methods.
Date of Conference: 28-31 July 2022
Date Added to IEEE Xplore: 30 August 2022
ISBN Information:

ISSN Information:

Conference Location: Melbourne, Australia

Contact IEEE to Subscribe

References

References is not available for this document.