Loading [MathJax]/extensions/MathZoom.js
Holistic Multi-Modal Memory Network for Movie Question Answering | IEEE Journals & Magazine | IEEE Xplore

Holistic Multi-Modal Memory Network for Movie Question Answering


Abstract:

Answering questions using multi-modal context is a challenging problem, as it requires a deep integration of diverse data sources. Existing approaches only consider a sub...Show More

Abstract:

Answering questions using multi-modal context is a challenging problem, as it requires a deep integration of diverse data sources. Existing approaches only consider a subset of all possible interactions among data sources during one attention hop. In this paper, we present a holistic multi-modal memory network (HMMN) framework that fully considers interactions between different input sources (multi-modal context and question) at each hop. In addition, to hone in on relevant information, our framework takes answer choices into consideration during the context retrieval stage. Our HMMN framework effectively integrates information from the multi-modal context, question, and answer choices, enabling more informative context to be retrieved for question answering. Experimental results on the Movie QA and TVQA datasets validate the effectiveness of our HMMN framework. Extensive ablation studies show the importance of holistic reasoning and reveal the contributions of different attention strategies to model performance.
Published in: IEEE Transactions on Image Processing ( Volume: 29)
Page(s): 489 - 499
Date of Publication: 02 August 2019

ISSN Information:

PubMed ID: 31395548

Funding Agency:


Contact IEEE to Subscribe

References

References is not available for this document.