Abstract:
Self-supervised multi-object trackers have tremendous potential as they enable learning from raw domain-specific data. However, their re-identification accuracy still fal...Show MoreMetadata
Abstract:
Self-supervised multi-object trackers have tremendous potential as they enable learning from raw domain-specific data. However, their re-identification accuracy still falls short compared to their supervised counterparts. We hypothesize that this drawback results from formulating self-supervised objectives that are limited to single frames or frame pairs. Such formulations do not capture sufficient visual appearance variations to facilitate learning consistent re-identification features for autonomous driving when the frame rate is low, or object dynamics are high. In this work, we propose a training objective that enables self-supervised learning of re-identification features from multiple sequential frames by enforcing consistent association scores across short and long timescales. We perform extensive evaluations that demonstrate that re-identification features trained from longer sequences significantly reduce ID switches on standard autonomous driving datasets compared to existing self-supervised learning methods, which are limited to training on frame pairs. Using our proposed SubCo loss function, we set the new state of the art among self-supervised methods and even perform on par with fully supervised learning methods.
Published in: IEEE Robotics and Automation Letters ( Volume: 8, Issue: 11, November 2023)