Loading [a11y]/accessibility-menu.js
From Pixels to Semantics: Self-Supervised Video Object Segmentation With Multiperspective Feature Mining | IEEE Journals & Magazine | IEEE Xplore

From Pixels to Semantics: Self-Supervised Video Object Segmentation With Multiperspective Feature Mining


Abstract:

Existing self-supervised methods pose one-shot video object segmentation (O-VOS) as pixel-level matching to enable segmentation mask propagation across frames. However, t...Show More

Abstract:

Existing self-supervised methods pose one-shot video object segmentation (O-VOS) as pixel-level matching to enable segmentation mask propagation across frames. However, the two tasks are not fully equivalent since O-VOS is more reliant on semantic correspondence rather than accurate pixel matching. To remedy this issue, we explore a new self-supervised framework that integrates pixel-level correspondence learning with semantic-level adaptation. The pixel-level correspondence learning is performed through photometric reconstruction of adjacent RGB frames during offline training, while semantic-level adaption operates at test-time by enforcing a bi-directional agreement of the predicted segmentation masks. In addition, we further propose a new network architecture with multi-perspective feature mining mechanism which can not only enhance reliable features but also suppress noisy ones to facilitate more robust image matching. By training the network using the proposed self-supervised framework, we achieve state-of-the-art performance on widely adopted datasets, further closing up the gap between self-supervised learning methods and their fully supervised counterparts.
Published in: IEEE Transactions on Image Processing ( Volume: 31)
Page(s): 5801 - 5812
Date of Publication: 02 September 2022

ISSN Information:

PubMed ID: 36054396

Funding Agency:


Contact IEEE to Subscribe

References

References is not available for this document.