Loading [MathJax]/extensions/MathMenu.js
Local Correspondence Network for Weakly Supervised Temporal Sentence Grounding | IEEE Journals & Magazine | IEEE Xplore

Local Correspondence Network for Weakly Supervised Temporal Sentence Grounding


Abstract:

Weakly supervised temporal sentence grounding has better scalability and practicability than fully supervised methods in real-world application scenarios. However, most o...Show More

Abstract:

Weakly supervised temporal sentence grounding has better scalability and practicability than fully supervised methods in real-world application scenarios. However, most of existing methods cannot model the fine-grained video-text local correspondences well and do not have effective supervision information for correspondence learning, thus yielding unsatisfying performance. To address the above issues, we propose an end-to-end Local Correspondence Network (LCNet) for weakly supervised temporal sentence grounding. The proposed LCNet enjoys several merits. First, we represent video and text features in a hierarchical manner to model the fine-grained video-text correspondences. Second, we design a self-supervised cycle-consistent loss as a learning guidance for video and text matching. To the best of our knowledge, this is the first work to fully explore the fine-grained correspondences between video and text for temporal sentence grounding by using self-supervised learning. Extensive experimental results on two benchmark datasets demonstrate that the proposed LCNet significantly outperforms existing weakly supervised methods.
Published in: IEEE Transactions on Image Processing ( Volume: 30)
Page(s): 3252 - 3262
Date of Publication: 17 February 2021

ISSN Information:

PubMed ID: 33596176

Funding Agency:


Contact IEEE to Subscribe

References

References is not available for this document.