Conferences >ICASSP 2025 - 2025 IEEE Inter...

Speech Emotion Recognition Based on Large-Scale Automatic Speech Recognizer

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

This paper proposes a novel speech emotion recognition (SER) method that fully leverages the architecture of Whisper, a large-scale automatic speech recognition (ASR) mod...Show More

Metadata

Abstract:

This paper proposes a novel speech emotion recognition (SER) method that fully leverages the architecture of Whisper, a large-scale automatic speech recognition (ASR) model. The conventional SER models using a pre-trained speech encoder may fail to capture linguistic content since their decoders are too simple. Our proposed method addresses this shortcoming by adopting the decoder of Whisper, which has been discarded in conventional SER, to leverage its language modeling capability. The proposed method introduces special tokens corresponding to the target emotions and then fine-tunes the entire Whisper model. Furthermore, we also propose a new training scheme suitable for Whisper, named serialized multi-task learning (SerialMTL), to consider various speech information as context for the objective SER task. In SerialMTL, the model initially predicts subtask tokens, such as transcription and gender tokens, and then estimates the emotion token. An advantage of the proposed method is the simplicity of the model structure, even when adding any new subtasks. Experimental results show that our model, based on the entire Whisper, achieves better SER performance than the conventional model and further improves with SerialMTL training via ASR and gender recognition subtasks.

Published in: ICASSP 2025 - 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Date of Conference: 06-11 April 2025

Date Added to IEEE Xplore: 07 March 2025

ISBN Information:

ISSN Information:

DOI: 10.1109/ICASSP49660.2025.10889314

Conference Location: Hyderabad, India

Contents

References is not available for this document.

Speech Emotion Recognition Based on Large-Scale Automatic Speech Recognizer

Abstract:

Metadata

Abstract:

ISSN Information:

References

IEEE Account

Purchase Details

Profile Information

Need Help?

Speech Emotion Recognition Based on Large-Scale Automatic Speech Recognizer

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

Authors

Figures

References

Keywords

Metrics

Footnotes

References

IEEE Account

Purchase Details

Profile Information

Need Help?