Journals & Magazines >IEEE Access >Volume: 13

Answer Distillation Network With Bi-Text-Image Attention for Medical Visual Question Answering

Overall framework of our proposed BTIA-AD Net

Abstract:

Medical Visual Question Answering (Med-VQA) is a multimodal task that aims to obtain the correct answers based on medical images and questions. Med-VQA, as a classificati...Show More

Metadata

Abstract:

Medical Visual Question Answering (Med-VQA) is a multimodal task that aims to obtain the correct answers based on medical images and questions. Med-VQA, as a classification task, is typically more challenging for algorithms to predict answers to open-ended questions than to closed-ended questions due to the larger number of answer categories for the former. Consequently, the accuracy of predictions for open-ended questions is generally lower than that for closed-ended questions. In this study, we design answer distillation network with bi-text-image attention (BTIA-AD Net) to solve the above problem. We present an answer distillation network to refine the answers and convert an open-ended question into a multiple-choice question with a selection of candidate answers. To fully utilize the candidate answer information from answer distillation network, we propose a bi-text-image attention fusion module composed of self-attention and guided attention to automatically fuse image features, question representations, and candidate answer information and achieve intra-modal and inter-modal semantic interaction. Extensive experiments validate the effectiveness of BTIA-AD Net. Results prove that our model can efficiently compress the answer space of open-ended tasks, improve the answer accuracy, and provide new state-of-the-art performance on the VQA-RAD dataset.

Overall framework of our proposed BTIA-AD Net

Published in: IEEE Access ( Volume: 13)

Page(s): 16455 - 16465

Date of Publication: 21 January 2025

Electronic ISSN: 2169-3536

DOI: 10.1109/ACCESS.2025.3532308

Funding Agency:

Contents

References is not available for this document.

Answer Distillation Network With Bi-Text-Image Attention for Medical Visual Question Answering

Abstract:

Metadata

Abstract:

Funding Agency:

References

IEEE Account

Purchase Details

Profile Information

Need Help?

Answer Distillation Network With Bi-Text-Image Attention for Medical Visual Question Answering

Alerts

Abstract:

Metadata

Abstract:

Funding Agency:

References

IEEE Account

Purchase Details

Profile Information

Need Help?