Abstract:
Unforeseen appearance variation is a challenging factor for visual tracking. This paper provides a novel solution from semantic data augmentation, which facilitates offli...Show MoreMetadata
Abstract:
Unforeseen appearance variation is a challenging factor for visual tracking. This paper provides a novel solution from semantic data augmentation, which facilitates offline training of trackers for better generalization. We utilize existing samples to obtain knowledge to augment another in terms of diversity and hardness. First, we propose that the similarity matching space in Siamese-like models has class-agnostic transferability. Based on this, we design the Latent Augmentation (LaAug) to transfer relevant variations and suppress irrelevant ones between training similarity embeddings of different classes. Thus the model can generalize across a more diverse semantic distribution. Then, we propose the Semantic Interaction Mix (SIMix), which interacts moments between different feature samples to contaminate structure and texture attributes and retain other semantic attributes. SIMix simulates the occlusion and complements the training distribution with hard cases. The mixed features with adversarial perturbations can empirically enable the model against external environmental disturbances. Experiments on six challenging benchmarks demonstrate that three representative tracking models, i.e., SiamBAN, TransT and OSTrack, can be consistently improved by incorporating the proposed methods without extra parameters and inference cost.
Published in: IEEE Transactions on Multimedia ( Volume: 27)