Loading [MathJax]/extensions/MathZoom.js
FRCL-MNER: A Finer Grained Rank-Based Contrastive Learning Framework for Multimodal NER | IEEE Journals & Magazine | IEEE Xplore

FRCL-MNER: A Finer Grained Rank-Based Contrastive Learning Framework for Multimodal NER


Abstract:

Multimodal named entity recognition (MNER) is an emerging field that aims to automatically detect named entities and classify their categories, utilizing input text and a...Show More

Abstract:

Multimodal named entity recognition (MNER) is an emerging field that aims to automatically detect named entities and classify their categories, utilizing input text and auxiliary resources such as images. While previous studies have leveraged object detectors to preprocess images and fuse textual semantics with corresponding image features, these methods often overlook the potential finer grained information within each modality and may exacerbate error propagation due to predetection. To address these issues, we propose a finer grained rank-based contrastive learning (FRCL) framework for MNER. This framework employs a global-level contrastive learning to align multimodal semantic features and a Top-K rank-based mask strategy to construct positive–negative pairs, thereby learning a finer grained multimodal interaction representation. Experimental results from three well-known social media datasets reveal that our approach surpasses existing strong baselines, and achieves up to a 1.54% improvement on the Twitter2015 dataset. Extensive discussions further confirm the effectiveness of our approach. We will release the source code on https://github.com/augusyan/FRCL.
Page(s): 1 - 15
Date of Publication: 10 February 2025

ISSN Information:

PubMed ID: 40031693

Funding Agency:


Contact IEEE to Subscribe