Loading [MathJax]/extensions/MathMenu.js
InVERGe: Intelligent Visual Encoder for Bridging Modalities in Report Generation | IEEE Conference Publication | IEEE Xplore

InVERGe: Intelligent Visual Encoder for Bridging Modalities in Report Generation


Abstract:

Medical image captioning plays an important role in modern healthcare, improving clinical report generation and aiding radiologists in detecting abnormalities and reducin...Show More

Abstract:

Medical image captioning plays an important role in modern healthcare, improving clinical report generation and aiding radiologists in detecting abnormalities and reducing misdiagnosis. The complex visual and textual data biases make this task more challenging. Recent advancements in transformer-based models have significantly improved the generation of radiology reports from medical images. However, these models require substantial computational resources for training and have been observed to produce unnatural language outputs when trained solely on raw image-text pairs. Our aim is to generate more detailed reports specific to images and to explain the reasoning behind the generated text through image-text alignment. Given the high computational demands of end-to-end model training, we introduce a two-step training methodology with an Intelligent Visual Encoder for Bridging Modalities in Report Generation (InVERGe) model. This model incorporates a lightweight transformer known as the Cross-Modal Query Fusion Layer (CMQFL), which utilizes the output from a frozen encoder to identify the most relevant text-grounded image embedding. This layer bridges the gap between the encoder and decoder, significantly reducing the workload on the decoder and enhancing the alignment between vision and language. Our experimental results, conducted using the MIMIC-CXR, Indiana University chest X-ray images, and CDD-CESM breast images datasets, demonstrate the effectiveness of our approach. Code: https://github.com/labsroy007/InVERGe
Date of Conference: 17-18 June 2024
Date Added to IEEE Xplore: 27 September 2024
ISBN Information:

ISSN Information:

Conference Location: Seattle, WA, USA

Contact IEEE to Subscribe

References

References is not available for this document.