Journals & Magazines >IEEE Transactions on Multimedia >Volume: 26

Multi-Task Paired Masking With Alignment Modeling for Medical Vision-Language Pre-Training

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

In recent years, the growing demand for medical imaging diagnosis has placed a significant burden on radiologists. As a solution, Medical Vision-Language Pre-training (Me...Show More

Metadata

Abstract:

In recent years, the growing demand for medical imaging diagnosis has placed a significant burden on radiologists. As a solution, Medical Vision-Language Pre-training (Med-VLP) methods have been proposed to learn universal representations from medical images and reports, benefiting downstream tasks without requiring fine-grained annotations. However, existing methods have overlooked the importance of cross-modal alignment in joint image-text reconstruction, resulting in insufficient cross-modal interaction. To address this limitation, we propose a unified Med-VLP framework based on Multi-task Paired Masking with Alignment (MPMA) to integrate the cross-modal alignment task into the joint image-text reconstruction framework to achieve more comprehensive cross-modal interaction, while a Global and Local Alignment (GLA) module is designed to assist self-supervised paradigm in obtaining semantic representations with rich domain knowledge. Furthermore, we introduce a Memory-Augmented Cross-Modal Fusion (MA-CMF) module to fully integrate visual information to assist report reconstruction and fuse the multi-modal representations adequately. Experimental results demonstrate that the proposed unified approach outperforms previous methods in all downstream tasks, including uni-modal, cross-modal, and multi-modal tasks.

Published in: IEEE Transactions on Multimedia ( Volume: 26)

Page(s): 4706 - 4721

Date of Publication: 19 October 2023

ISSN Information:

DOI: 10.1109/TMM.2023.3325965

Funding Agency:

Contents

References is not available for this document.

Multi-Task Paired Masking With Alignment Modeling for Medical Vision-Language Pre-Training

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

References

IEEE Account

Purchase Details

Profile Information

Need Help?

Multi-Task Paired Masking With Alignment Modeling for Medical Vision-Language Pre-Training

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

References

IEEE Account

Purchase Details

Profile Information

Need Help?