Loading [MathJax]/extensions/MathZoom.js
CACRM: Cross-Attention Based Image-Text CrossModal Retrieval | IEEE Conference Publication | IEEE Xplore

CACRM: Cross-Attention Based Image-Text CrossModal Retrieval


Abstract:

Cross-modal retrieval aims to match instance from one modality with instance from another modality. Since the learned low-level features of different modalities are heter...Show More
Notes: As originally published text, pages or figures in the document were missing or not clearly visible. A corrected replacement file was provided by the authors.

Abstract:

Cross-modal retrieval aims to match instance from one modality with instance from another modality. Since the learned low-level features of different modalities are heterogeneous and the high-level semantics are related, it is difficult to learn correspondence between them. Recently, the fine-grained matching methods by aggregating the similarities from all possible region-word pairs have shown advance. However, the local alignment is hard to achieve with mutual interference between different regions in the modalities. To tackle this problem, a Cross Attention for Cross-Modal Retrieval Method (CACRM) is proposed, which aims to construct a Cross Attention Model (CAM) by introducing multiple independent Transformer modules, extracting the interactive features between each image Region of Interest (ROI) and text. Then, the output of multiple Transformer modules is applied to construct a similarity matrix, making fine-grained alignment. Finally, the global similarity is obtained by pooling the similarity matrix. Besides, in view of the imbalanced distribution characteristics of the sample data, the corresponding weights are assigned to different image sub-regions according to the similarity score differentiation. Extensive experiments are performed on the public datasets Flicker30k, MS-COCO and dam inspection log dataset. Experimental results show that the proposed method significantly surpasses previous methods, demonstrating the effectiveness.
Notes: As originally published text, pages or figures in the document were missing or not clearly visible. A corrected replacement file was provided by the authors.
Date of Conference: 15-18 August 2022
Date Added to IEEE Xplore: 27 September 2022
ISBN Information:
Conference Location: Newark, CA, USA

Funding Agency:


Contact IEEE to Subscribe

References

References is not available for this document.