Abstract:
Multimodal relation extraction (MRE) is an emerging research field that combines techniques from natural language processing, computer vision, and machine learning, helpi...Show MoreMetadata
Abstract:
Multimodal relation extraction (MRE) is an emerging research field that combines techniques from natural language processing, computer vision, and machine learning, helping us better understand and interpret data. However, current methods are faced with two main issues. The first issue is that the auxiliary images may focus on irrelevant information, leading the model to place too much attention on unrelated entities when extracting visual features from the global image. The second issue is modality imbalance during the multimodal fusion stage, where the encoder might overly focus on one modality’s information over the other. Therefore, we propose a MRE method using Dual Fusion with Mutual Attention (DFMA). To address the first issue, we introduce the Pyramid Feature Extraction (PFE) module. To tackle the second issue, we propose the Multimodal Mutual Fusion (MMF) module. PFE dynamically generates prompt vectors to mitigate the impact of irrelevant information, while MMF, designed with mutual attention, can balance the fusion of textual and visual information simultaneously. Experiments show that our approach achieves the state-of-the-art performances on the multimodal neural relation extraction (MNRE) dataset.
Published in: ICASSP 2025 - 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Date of Conference: 06-11 April 2025
Date Added to IEEE Xplore: 07 March 2025
ISBN Information: