Loading [a11y]/accessibility-menu.js
Collaborative Audio-Visual Event Localization Based on Sequential Decision and Cross-Modal Consistency | IEEE Conference Publication | IEEE Xplore

Collaborative Audio-Visual Event Localization Based on Sequential Decision and Cross-Modal Consistency


Abstract:

We focus on the audio-visual event (AVE) localization task, which refers to locating the segments with AVE and identifying their event categories. Since different event-r...Show More

Abstract:

We focus on the audio-visual event (AVE) localization task, which refers to locating the segments with AVE and identifying their event categories. Since different event-relevant video segments often describe different aspects of an AVE, they can complement each other. However, current approaches model the AVE localization task as a sequential classification process, through which event-relevant video segments cannot accurately collaborate with each other. Therefore, we propose the Collaborative Segments Decision (CSD) that can collaborate between event-relevant video segments by modeling the AVE localization task as a sequential decision process. In addition, to realize collaboration between cross-modal features, we propose the Consistent Feature Propagation (CFP) by exploiting their consistency over time. We propose the Collaborative Decision Network (CDN) by combining the above components. Experimental results show that CDN outperforms baseline methods in fully and weakly supervised settings.
Date of Conference: 04-10 June 2023
Date Added to IEEE Xplore: 05 May 2023
ISBN Information:

ISSN Information:

Conference Location: Rhodes Island, Greece

Funding Agency:


Contact IEEE to Subscribe

References

References is not available for this document.