Loading [MathJax]/extensions/MathMenu.js
Enhancing Audio-Visual Question Answering with Missing Modality via Trans-Modal Associative Learning | IEEE Conference Publication | IEEE Xplore

Enhancing Audio-Visual Question Answering with Missing Modality via Trans-Modal Associative Learning


Abstract:

We present a novel method for Audio-Visual Question Answering (AVQA) in real-world scenarios where one modality (audio or visual) can be missing. Inspired by human cognit...Show More

Abstract:

We present a novel method for Audio-Visual Question Answering (AVQA) in real-world scenarios where one modality (audio or visual) can be missing. Inspired by human cognitive processes, we introduce a Trans-Modal Associative (TMA) memory that recalls missing modal information (i.e., pseudo modal feature) by establishing associations between available modal features and textual cues. During training phase, we employ a Trans-Modal Recalling (TMR) loss to guide the TMA memory in generating the pseudo modal feature that closely matches the real modal feature. This allows our method to robustly answer the question, even when one modality is missing during inference. We believe that our approach, which effectively copes with missing modalities, can be broadly applied to a variety of multimodal applications.
Date of Conference: 14-19 April 2024
Date Added to IEEE Xplore: 18 March 2024
ISBN Information:

ISSN Information:

Conference Location: Seoul, Korea, Republic of

Funding Agency:


Contact IEEE to Subscribe

References

References is not available for this document.