Loading [MathJax]/extensions/MathMenu.js
Synergy of GPT-3 Summarization and Vision-Encoder-Decoder for Chest X-ray Captioning | IEEE Conference Publication | IEEE Xplore

Synergy of GPT-3 Summarization and Vision-Encoder-Decoder for Chest X-ray Captioning


Abstract:

In the ever-evolving domain of medical image analysis, the task of accurately captioning chest X-ray images remains a pivotal challenge. This research introduces a novel ...Show More

Abstract:

In the ever-evolving domain of medical image analysis, the task of accurately captioning chest X-ray images remains a pivotal challenge. This research introduces a novel approach to image captioning tailored for the chest X-ray dataset, our chosen case study. At the core of our proposed methodology lies the utilization of transformers for preliminary data summarization. We strategically eliminate redundant details while preserving critical information by employing a transformer to condense the captions prior to training. We have empirically shown that this summarized representation enhances the efficiency and accuracy of subsequent image captioning techniques. Specifically, we proposed a Vision Encoder Decoder (ViT-) model for the image captioning task and integrated the GPT transformer for the preceding caption summarization. Our findings underscore the efficacy of this amalgamation, with notable empirical improvements observed in the performance metrics on the chest X-ray dataset compared to existing techniques. This study paves the way for further exploration into the synergistic application of transformers in medical image analysis. It also emphasizes the importance of effective data summarization in achieving robust captioning outcomes.
Date of Conference: 06-09 August 2024
Date Added to IEEE Xplore: 12 September 2024
ISBN Information:

ISSN Information:

Conference Location: Kingston, ON, Canada

Contact IEEE to Subscribe

References

References is not available for this document.