Conferences >ICASSP 2025 - 2025 IEEE Inter...

SoloAudio: Target Sound Extraction with Language-oriented Audio Diffusion Transformer

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

In this paper, we introduce SoloAudio, a novel diffusion-based generative model for target sound extraction (TSE). Our approach trains latent diffusion models on audio, r...Show More

Metadata

Abstract:

In this paper, we introduce SoloAudio, a novel diffusion-based generative model for target sound extraction (TSE). Our approach trains latent diffusion models on audio, replacing the previous U-Net backbone with a skip-connected Transformer that operates on latent features. SoloAudio supports both audio-oriented and language-oriented TSE by utilizing a CLAP model as the feature extractor for target sounds. Furthermore, SoloAudio leverages synthetic audio generated by state-of-the-art text-to-audio models for training, demonstrating strong generalization to out-of-domain data and unseen sound events. We evaluate this approach on the FSD Kaggle 2018 mixture dataset and real data from AudioSet, where SoloAudio achieves the state-of-the-art results on both in-domain and out-of-domain data, and exhibits impressive zero-shot and few-shot capabilities. Source code¹ and demos² are released.

Published in: ICASSP 2025 - 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Date of Conference: 06-11 April 2025

Date Added to IEEE Xplore: 07 March 2025

ISBN Information:

ISSN Information:

DOI: 10.1109/ICASSP49660.2025.10890066

Conference Location: Hyderabad, India

Contents

References is not available for this document.

SoloAudio: Target Sound Extraction with Language-oriented Audio Diffusion Transformer

Abstract:

Metadata

Abstract:

ISSN Information:

References

IEEE Account

Purchase Details

Profile Information

Need Help?

SoloAudio: Target Sound Extraction with Language-oriented Audio Diffusion Transformer

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

Authors

Figures

References

Keywords

Metrics

Footnotes

References

IEEE Account

Purchase Details

Profile Information

Need Help?