Comparative overview of our approach with previous works. a) Previous work decomposed two subtasks of video retrieval and moment retrieval democratically in VCMR. (b) Our...
Abstract:
Video corpus moment retrieval (VCMR) task aims to retrieve a specific moment from a large corpus of untrimmed videos. This task has been addressed by decomposing it into ...Show MoreMetadata
Abstract:
Video corpus moment retrieval (VCMR) task aims to retrieve a specific moment from a large corpus of untrimmed videos. This task has been addressed by decomposing it into video retrieval and moment retrieval subtasks each with specialized heads (i.e., democratic decomposition), due to the computational complexity. However, this approach overlooks the interdependency between the subtasks, which is crucial for temporally fine-grained query-video alignment. To address this suboptimality, we propose an integrated learning framework that explicitly establishes a connection between the two subtasks, namely moment-aware video retrieval, allowing them to benefit from each other’s learning process. Furthermore, we employ a curriculum-based negative sampling strategy that gradually provides harder negative samples to enhance the discriminative ability between semantically similar negative videos. We empirically show that the proposed approach outperforms state-of-the-art methods on three benchmarks – TVR, ActivityNet, and DiDeMo. Notably, on TVR, our method achieves 10.04% in VCMR R@1 with tIoU=0.7, representing a 1.68% absolute improvement over prior work, and similarly demonstrates consistent gains on ActivityNet (4.98% in VCMR R@1 with tIoU=0.5) and DiDeMo, validating the effectiveness of our integrated approach.
Comparative overview of our approach with previous works. a) Previous work decomposed two subtasks of video retrieval and moment retrieval democratically in VCMR. (b) Our...
Published in: IEEE Access ( Volume: 13)