Conferences >ICASSP 2024 - 2024 IEEE Inter...

Enhancing Audio-Visual Question Answering with Missing Modality via Trans-Modal Associative Learning

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

We present a novel method for Audio-Visual Question Answering (AVQA) in real-world scenarios where one modality (audio or visual) can be missing. Inspired by human cognit...Show More

Metadata

Abstract:

We present a novel method for Audio-Visual Question Answering (AVQA) in real-world scenarios where one modality (audio or visual) can be missing. Inspired by human cognitive processes, we introduce a Trans-Modal Associative (TMA) memory that recalls missing modal information (i.e., pseudo modal feature) by establishing associations between available modal features and textual cues. During training phase, we employ a Trans-Modal Recalling (TMR) loss to guide the TMA memory in generating the pseudo modal feature that closely matches the real modal feature. This allows our method to robustly answer the question, even when one modality is missing during inference. We believe that our approach, which effectively copes with missing modalities, can be broadly applied to a variety of multimodal applications.

Published in: ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Date of Conference: 14-19 April 2024

Date Added to IEEE Xplore: 18 March 2024

ISBN Information:

ISSN Information:

DOI: 10.1109/ICASSP48485.2024.10446292

Conference Location: Seoul, Korea, Republic of

Funding Agency:

Contents

1. INTRODUCTION

In our daily lives, we often rely on both visual and auditory cues for answering questions [1]–[8]. To simulate this human perception capability in question-answering, the AudioVisual Question Answering (AVQA) task has emerged. This task guides machines to comprehend and respond to questions by utilizing combined audio-visual information relevant to the question text. Due to this property, the AVQA task is closely linked to various practical real-world applications, including autonomous navigation [9] and interactive education [10].

References is not available for this document.

Enhancing Audio-Visual Question Answering with Missing Modality via Trans-Modal Associative Learning

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

1. INTRODUCTION

References

IEEE Account

Purchase Details

Profile Information

Need Help?

Enhancing Audio-Visual Question Answering with Missing Modality via Trans-Modal Associative Learning

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

1. INTRODUCTION

Authors

Figures

References

Keywords

Metrics

Footnotes

References

IEEE Account

Purchase Details

Profile Information

Need Help?