Journals & Magazines >IEEE Transactions on Multimedia >Volume: 26

Text-to-Image Person Re-Identification Based on Multimodal Graph Convolutional Network

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

Text-to-image person re-identification (ReID) is a common subproblem in the field of person re-identification and image-text retrieval. Recent approaches generally follow...Show More

Metadata

Abstract:

Text-to-image person re-identification (ReID) is a common subproblem in the field of person re-identification and image-text retrieval. Recent approaches generally follow the structure of a dual-stream network, extracting image and text features. There is no deep interaction between images and text in this approach, making it difficult for the network to learn a highly semantic feature representation. In addition, for both image data and text data, the feature extraction process is modeled in a regular way, such as using Transformer to extract sequence embeddings. However, this type of modeling disregards the inherent relationships among multimodal input embeddings. A more flexible approach to mining multimodal data, which uniformly treats the data as graphs, is proposed. In this way, the extraction and interaction of multimodal information are accomplished by means of messages passing between graph nodes. First, a unified multimodal feature extraction and fusion network is proposed based on the graph convolutional network, which enables the progression of multimodal information from ‘local’ to ‘global’. Second, an asymmetric multilevel alignment module, which focuses on more accurate ‘local’ information from a ‘global’ perspective, is proposed to progressively divide the multimodal information at each level. Last, a cross-modal representation matching strategy based on similarity distribution and mutual information is proposed to achieve cross-modal alignment. The proposed algorithm in this paper is simple and efficient, and the testing results on three public datasets (CUHK-PEDES, ICFG-PEDES and RSTPReID) show that it can achieve SOTA-level performance.

Published in: IEEE Transactions on Multimedia ( Volume: 26)

Page(s): 6025 - 6036

Date of Publication: 19 December 2023

ISSN Information:

DOI: 10.1109/TMM.2023.3344354

Funding Agency:

Contents

References is not available for this document.

Text-to-Image Person Re-Identification Based on Multimodal Graph Convolutional Network

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

References

IEEE Account

Purchase Details

Profile Information

Need Help?

Text-to-Image Person Re-Identification Based on Multimodal Graph Convolutional Network

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

Authors

Figures

References

Citations

Keywords

Metrics

Footnotes

References

IEEE Account

Purchase Details

Profile Information

Need Help?