Abstract:
Object re-identification (ReID) is prone to errors under variations in scale, illumination, complex background, and object occlusion scenarios. To overcome these challeng...Show MoreMetadata
Abstract:
Object re-identification (ReID) is prone to errors under variations in scale, illumination, complex background, and object occlusion scenarios. To overcome these challenges, attention mechanisms are employed to focus on the object's characteristics, thereby extracting better discriminative features. This paper introduces a local-global vision transformer (LoGoViT) for object re-identification by learning a hierarchical-level representation from fine-grained (local) to general (global) context features. It comprises two components: (i) shift and shuffle operations to generate robust local features and (ii) local-global module to aggregate the multi-level hierarchy features of an object. Extensive experiments show that our method achieves state-of-the-art on the ReID benchmarks. We further investigate effective augmentation operations and discuss how the patch modifications improve the proposed model's generalization under occlusion scenarios. The source code is available at https://github.com/nguyenphan99/LoGoViT.
Published in: ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Date of Conference: 04-10 June 2023
Date Added to IEEE Xplore: 05 May 2023
ISBN Information:
ISSN Information:
Keywords assist with retrieval of results and provide a means to discovering other relevant content. Learn more.
- IEEE Keywords
- Index Terms
- Vision Transformer ,
- Object Re-identification ,
- Local Features ,
- Attention Mechanism ,
- Scale Variation ,
- Convolutional Neural Network ,
- Step Size ,
- Spatial Information ,
- Visual Features ,
- Global Features ,
- Receptive Field ,
- Input Sequence ,
- Patch Size ,
- Convolutional Neural Network Architecture ,
- Image Patches ,
- Transformer Architecture ,
- Transformer Layers ,
- Fine-grained Features ,
- Position Embedding ,
- CNN-based Approaches ,
- Re-identification Methods
- Author Keywords
Keywords assist with retrieval of results and provide a means to discovering other relevant content. Learn more.
- IEEE Keywords
- Index Terms
- Vision Transformer ,
- Object Re-identification ,
- Local Features ,
- Attention Mechanism ,
- Scale Variation ,
- Convolutional Neural Network ,
- Step Size ,
- Spatial Information ,
- Visual Features ,
- Global Features ,
- Receptive Field ,
- Input Sequence ,
- Patch Size ,
- Convolutional Neural Network Architecture ,
- Image Patches ,
- Transformer Architecture ,
- Transformer Layers ,
- Fine-grained Features ,
- Position Embedding ,
- CNN-based Approaches ,
- Re-identification Methods
- Author Keywords