Video Inpainting Localization With Contrastive Learning | IEEE Journals & Magazine | IEEE Xplore

Video Inpainting Localization With Contrastive Learning


Abstract:

Video inpainting techniques typically serve to restore destroyed or missing regions in digital videos. However, such techniques may also be illegally used to remove impor...Show More

Abstract:

Video inpainting techniques typically serve to restore destroyed or missing regions in digital videos. However, such techniques may also be illegally used to remove important objects for creating forged videos. This letter proposes a simple yet effective forensic scheme for Video Inpainting LOcalization with ContrAstive Learning (ViLocal). A 3D Uniformer encoder is applied to the video noise residual for learning effective spatiotemporal features. To enhance discriminative power, supervised contrastive learning is adopted to capture the local regional inconsistency through separating the pristine and inpainted pixels. The pixel-wise inpainting localization map is yielded by a lightweight convolution decoder with two-stage training. To prepare enough training samples, we build a video object segmentation dataset (VOS2k5) of 2500 videos with pixel-level annotations per frame. Extensive experimental results validate the superiority of ViLocal over the state-of-the-arts.
Published in: IEEE Signal Processing Letters ( Volume: 32)
Page(s): 611 - 615
Date of Publication: 08 January 2025

ISSN Information:

Funding Agency:

Keywords assist with retrieval of results and provide a means to discovering other relevant content. Learn more.


Keywords assist with retrieval of results and provide a means to discovering other relevant content. Learn more.

References

References is not available for this document.