Conferences >ICASSP 2025 - 2025 IEEE Inter...

Adapting Single-Channel Pre-trained Transformer Models for Multi-Channel Sound Event Localization and Detection

Abstract:

In recent years, the significance of pre-trained transformer audio models has been increasingly recognized. However, existing pre-trained transformer audio models are bas...Show More

Metadata

Abstract:

In recent years, the significance of pre-trained transformer audio models has been increasingly recognized. However, existing pre-trained transformer audio models are based on single-channel audio. They cannot be directly applied to multi-channel audio for Sound Event Localization and Detection (SELD) tasks. To address this issue, in this paper, we propose SELD-SSAST, a novel model based on the single-channel Self-Supervised Audio Spectrogram Transformer (SSAST). Specifically, we first introduce a fusion feature that enables SSAST to learn the unique features in SELD problems effectively. Secondly, we input the multi-channel audio features into a single SSAST module to learn the temporal information across channels through channel-mixing. Finally, to enable SSAST to learn the relationships between multi-channel audio features, we propose a Convolutional Cross Attention (CCA) module to replace the Transformer’s Self-Attention and an intensity vector (IV) enhanced module to learn the differences between channel features. Our experiments show that using SELD-SSAST improved performance by 23.5% and 20.2% over the baseline on two datasets, respectively. Additionally, with the same data scale, SELD-SSAST outperforms the models in state-of-the-art (SOTA) methods on two datasets.

Published in: ICASSP 2025 - 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Date of Conference: 06-11 April 2025

Date Added to IEEE Xplore: 07 March 2025

ISBN Information:

ISSN Information:

DOI: 10.1109/ICASSP49660.2025.10887709

Conference Location: Hyderabad, India

Funding Agency:

Contents

References is not available for this document.

Adapting Single-Channel Pre-trained Transformer Models for Multi-Channel Sound Event Localization and Detection

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

References

IEEE Account

Purchase Details

Profile Information

Need Help?

Adapting Single-Channel Pre-trained Transformer Models for Multi-Channel Sound Event Localization and Detection

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

References

IEEE Account

Purchase Details

Profile Information

Need Help?