Loading [MathJax]/extensions/MathZoom.js
Text-to-Image Person Re-Identification Based on Multimodal Graph Convolutional Network | IEEE Journals & Magazine | IEEE Xplore

Text-to-Image Person Re-Identification Based on Multimodal Graph Convolutional Network


Abstract:

Text-to-image person re-identification (ReID) is a common subproblem in the field of person re-identification and image-text retrieval. Recent approaches generally follow...Show More

Abstract:

Text-to-image person re-identification (ReID) is a common subproblem in the field of person re-identification and image-text retrieval. Recent approaches generally follow the structure of a dual-stream network, extracting image and text features. There is no deep interaction between images and text in this approach, making it difficult for the network to learn a highly semantic feature representation. In addition, for both image data and text data, the feature extraction process is modeled in a regular way, such as using Transformer to extract sequence embeddings. However, this type of modeling disregards the inherent relationships among multimodal input embeddings. A more flexible approach to mining multimodal data, which uniformly treats the data as graphs, is proposed. In this way, the extraction and interaction of multimodal information are accomplished by means of messages passing between graph nodes. First, a unified multimodal feature extraction and fusion network is proposed based on the graph convolutional network, which enables the progression of multimodal information from ‘local’ to ‘global’. Second, an asymmetric multilevel alignment module, which focuses on more accurate ‘local’ information from a ‘global’ perspective, is proposed to progressively divide the multimodal information at each level. Last, a cross-modal representation matching strategy based on similarity distribution and mutual information is proposed to achieve cross-modal alignment. The proposed algorithm in this paper is simple and efficient, and the testing results on three public datasets (CUHK-PEDES, ICFG-PEDES and RSTPReID) show that it can achieve SOTA-level performance.
Published in: IEEE Transactions on Multimedia ( Volume: 26)
Page(s): 6025 - 6036
Date of Publication: 19 December 2023

ISSN Information:

Funding Agency:


Contact IEEE to Subscribe

References

References is not available for this document.