Conferences >2020 IEEE International Sympo...

Audio Captioning Based on Combined Audio and Semantic Embeddings

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

Audio captioning is a recently proposed task for automatically generating a textual description of a given audio clip. Most existing approaches use the encoder-decoder mo...Show More

Metadata

Abstract:

Audio captioning is a recently proposed task for automatically generating a textual description of a given audio clip. Most existing approaches use the encoder-decoder model without using semantic information. In this study, we propose a bi-directional Gated Recurrent Unit (BiGRU) model based on encoder-decoder architecture using audio and semantic embed-dings. To obtain semantic embeddings, we extract subject-verb embeddings using the subjects and verbs from the audio captions. We use a Multilayer Perceptron classifier to predict subject-verb embeddings of test audio clips for the testing stage. Within the aim of extracting audio features, in addition to log Mel energies, we use a pretrained audio neural network (PANN) as a feature extractor which is used for the first time in the audio captioning task to explore the usability of audio embeddings in the audio captioning task. We combine audio embeddings and semantic embeddings to feed the BiGRU-based encoder-decoder model. Following this, we evaluate our model on two audio captioning datasets: Clotho and AudioCaps. Experimental results show that the proposed BiGRU-based deep model significantly outperforms the state of the art results across different evaluation metrics and inclusion of semantic information enhance the captioning performance.

Published in: 2020 IEEE International Symposium on Multimedia (ISM)

Date of Conference: 02-04 December 2020

Date Added to IEEE Xplore: 22 January 2021

ISBN Information:

DOI: 10.1109/ISM.2020.00014

Conference Location: Naples, Italy

Contents

References is not available for this document.

Audio Captioning Based on Combined Audio and Semantic Embeddings

Abstract:

Metadata

Abstract:

References

IEEE Account

Purchase Details

Profile Information

Need Help?

Audio Captioning Based on Combined Audio and Semantic Embeddings

Alerts

Abstract:

Metadata

Abstract:

Authors

Figures

References

Citations

Keywords

Metrics

References

IEEE Account

Purchase Details

Profile Information

Need Help?