Conferences >2019 8th International Confer...

A Dual-Complementary Acoustic Embedding Network Learned from Raw Waveform for Speech Emotion Recognition

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

Speech emotion recognition (SER) technology has recently become a trend in a broader field and has achieved remarkable recognition performances using deep learning techni...Show More

Metadata

Abstract:

Speech emotion recognition (SER) technology has recently become a trend in a broader field and has achieved remarkable recognition performances using deep learning technique. However, the recognition performances obtained using end-to-end learning directly from raw audio waveform still hardly exceed those based on hand-crafted acoustic descriptors. Instead of solely rely on raw waveform or acoustic descriptors for SER, we propose an acoustic space augmentation network, termed as Dual-Complementary Acoustic Embedding Network (DCaEN), that combines knowledge-based features with raw waveform embedding learned with a novel complementary constraint. DCaEN includes representations from eGeMAPS acoustic feature and raw waveform by specifying a negative cosine distance loss to explicitly constrain the raw waveform embedding to be different from eGeMAPS. Our experimental results demonstrate an improved emotion discriminative power on the IEMOCAP database, which achieves 59.31% in a four class emotion recognition. Our analysis also demonstrates that the learned raw waveform embedding of DCaEN converges close to reverse mirroring of the original eGeMAPS space.

Published in: 2019 8th International Conference on Affective Computing and Intelligent Interaction (ACII)

Date of Conference: 03-06 September 2019

Date Added to IEEE Xplore: 09 December 2019

ISBN Information:

ISSN Information:

DOI: 10.1109/ACII.2019.8925496

Conference Location: Cambridge, UK

Contents

References is not available for this document.

A Dual-Complementary Acoustic Embedding Network Learned from Raw Waveform for Speech Emotion Recognition

Abstract:

Metadata

Abstract:

ISSN Information:

References

IEEE Account

Purchase Details

Profile Information

Need Help?

A Dual-Complementary Acoustic Embedding Network Learned from Raw Waveform for Speech Emotion Recognition

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

Authors

Figures

References

Citations

Keywords

Metrics

References

IEEE Account

Purchase Details

Profile Information

Need Help?