Journals & Magazines >IEEE Transactions on Circuits... >Volume: 34 Issue: 6

Cross Time-Frequency Transformer for Temporal Action Localization

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

Most modern approaches in temporal action localization (TAL) mainly focus on time domain information, while neglecting the advantages of information from other domains. H...Show More

Metadata

Abstract:

Most modern approaches in temporal action localization (TAL) mainly focus on time domain information, while neglecting the advantages of information from other domains. How to effectively utilize information from different domains and their interactions in a reasonable manner has been an attractive yet challenging issue in TAL. In this paper, we propose a novel cross time-frequency Transformer model (TFFormer) for TAL. A dual-branch network architecture is designed to capture the time and frequency features at multiple scales, using the multi-scale transformer in the time branch and the DB1 Discrete Wavelet Transform (DWT) in the frequency branch. To fuse these features from different domains, we propose a cross time-frequency attention mechanism that includes a time pathway and a frequency pathway, enhancing the interaction between the temporal and frequency features. Furthermore, a gated control mechanism is designed to aggregate features from different scales, characterizing the respective contributions of features at different scales. We also design a new regression loss function for locating the time boundaries. Extensive experiments were carried out on four challenging benchmark datasets, including two third-person datasets and two first-person datasets. The proposed method achieves impressive results on these datasets. Specifically, TFFormer achieves an average mAP of 23.2% on Ego4D and 25.6% on EPIC-Kitchens 100, which outperform previous state-of-the-arts by a large margin. It also obtains competitive results on ActivityNet v1.3 and THUMOS14, with an average mAP of 36.2% and 67.8%. We also conducted extensive ablation studies to validate the effectiveness of each component in the proposed method.

Published in: IEEE Transactions on Circuits and Systems for Video Technology ( Volume: 34, Issue: 6, June 2024)

Page(s): 4625 - 4638

Date of Publication: 23 October 2023

ISSN Information:

DOI: 10.1109/TCSVT.2023.3326692

Funding Agency:

Keywords assist with retrieval of results and provide a means to discovering other relevant content. Learn more.

Contents

Keywords assist with retrieval of results and provide a means to discovering other relevant content. Learn more.

References is not available for this document.

Cross Time-Frequency Transformer for Temporal Action Localization

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

References

IEEE Account

Purchase Details

Profile Information

Need Help?

Cross Time-Frequency Transformer for Temporal Action Localization

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

References

IEEE Account

Purchase Details

Profile Information

Need Help?