Loading [a11y]/accessibility-menu.js
Dual-modality Seq2Seq Network for Audio-visual Event Localization | IEEE Conference Publication | IEEE Xplore

Dual-modality Seq2Seq Network for Audio-visual Event Localization


Abstract:

Audio-visual event localization requires one to identify the event which is both visible and audible in a video (either at a frame or video level). To address this task, ...Show More

Abstract:

Audio-visual event localization requires one to identify the event which is both visible and audible in a video (either at a frame or video level). To address this task, we propose a deep neural network named Audio-Visual sequence-to-sequence dual network (AVSDN). By jointly taking both audio and visual features at each time segment as inputs, our proposed model learns global and local event information in a sequence to sequence manner, which can be realized in either fully supervised or weakly supervised settings. Empirical results confirm that our proposed method performs favorably against recent deep learning approaches in both settings.
Date of Conference: 12-17 May 2019
Date Added to IEEE Xplore: 17 April 2019
ISBN Information:

ISSN Information:

Conference Location: Brighton, UK

Contact IEEE to Subscribe

References

References is not available for this document.