Abstract:
Large-scale fine-grained image retrieval aims to learn compact discriminative feature representations based on mining the subtle distinctions between visually similar obj...Show MoreMetadata
Abstract:
Large-scale fine-grained image retrieval aims to learn compact discriminative feature representations based on mining the subtle distinctions between visually similar objects. However, existing fine-grained image retrieval methods focus on enhancing the attention to the discriminative regions within single images, which barely exploit the high-order relational information between the global features and local region features across different images. Thus, the over-fitting problem of complex personalized differences can not be effectively solved. In addition, existing unconstrained vector quantization methods tend to assign unquantized feature vectors to a few major codewords, which are unable to effectively distinguish the quantized features and reduce the redundant information. To address these issues, we propose a novel optimal transport quantization method based on cross-X semantic hypergraph learning for large-scale fine-grained image retrieval. Specifically, we first introduce a cross-layer multi-scale aggregation module to extract the global features and local region features. Subsequently, we build a semantic hypergraph to model the high-order correlations between the global features and local region features extracted from different layers, different scales and different images, which can alleviate the over-fitting problem of complex personalized differences by suppressing sample-level and background noise. Moreover, we introduce an error regularization term into the progressive asymmetric quantization loss to reduce the quantization errors and preserve the semantic similarity. Finally, we attempt to introduce the code balance and uncorrelated constraints into the multi-codebook quantization framework to improve the utilization efficiency of codewords and reduce the redundant information, which can be approximated by solving the optimal transport problem. Experimental results on several fine-grained image datasets demonstrate that the proposed method outperf...
Published in: IEEE Transactions on Circuits and Systems for Video Technology ( Early Access )