Loading [MathJax]/extensions/MathZoom.js
Token-Mixer: Bind Image and Text in One Embedding Space for Medical Image Reporting | IEEE Journals & Magazine | IEEE Xplore

Token-Mixer: Bind Image and Text in One Embedding Space for Medical Image Reporting


Abstract:

Medical image reporting focused on automatically generating the diagnostic reports from medical images has garnered growing research attention. In this task, learning cro...Show More

Abstract:

Medical image reporting focused on automatically generating the diagnostic reports from medical images has garnered growing research attention. In this task, learning cross-modal alignment between images and reports is crucial. However, the exposure bias problem in autoregressive text generation poses a notable challenge, as the model is optimized by a word-level loss function using the teacher-forcing strategy. To this end, we propose a novel Token-Mixer framework that learns to bind image and text in one embedding space for medical image reporting. Concretely, Token-Mixer enhances the cross-modal alignment by matching image-to-text generation with text-to-text generation that suffers less from exposure bias. The framework contains an image encoder, a text encoder and a text decoder. In training, images and paired reports are first encoded into image tokens and text tokens, and these tokens are randomly mixed to form the mixed tokens. Then, the text decoder accepts image tokens, text tokens or mixed tokens as prompt tokens and conducts text generation for network optimization. Furthermore, we introduce a tailored text decoder and an alternative training strategy that well integrate with our Token-Mixer framework. Extensive experiments across three publicly available datasets demonstrate Token-Mixer successfully enhances the image-text alignment and thereby attains a state-of-the-art performance. Related codes are available at https://github.com/yangyan22/Token-Mixer.
Published in: IEEE Transactions on Medical Imaging ( Volume: 43, Issue: 11, November 2024)
Page(s): 4017 - 4028
Date of Publication: 11 June 2024

ISSN Information:

PubMed ID: 38861436

Funding Agency:


Contact IEEE to Subscribe

References

References is not available for this document.