Motion-Aware Video Paragraph Captioning via Exploring Object-Centered Internal Knowledge | IEEE Conference Publication | IEEE Xplore

Motion-Aware Video Paragraph Captioning via Exploring Object-Centered Internal Knowledge


Abstract:

Video paragraph captioning task aims at generating a fine-grained, coherent and relevant paragraph for a video. Different from the images where objects are static, the te...Show More

Abstract:

Video paragraph captioning task aims at generating a fine-grained, coherent and relevant paragraph for a video. Different from the images where objects are static, the temporal states of objects are changing in videos. The dynamic information could be contributed to understanding the whole video content. Existing works rarely put focus on modeling the dynamic changing state of the objects in the videos, causing the activities occurred in videos are poorly or wrongly depicted in paragraphs. To address this problem, we propose a novel Object State Tracking Network, which can capture the temporal state change of objects. However, due to the similarity of the consecutive frames in the videos, the information of the video is redundant and noisy. We further propose a semantic alignment mechanism, and enable the sentence information to refine the visual information. Extensive experiments on ActivityNet Captions demonstrate the effectiveness of our method.
Date of Conference: 04-10 June 2023
Date Added to IEEE Xplore: 05 May 2023
ISBN Information:

ISSN Information:

Conference Location: Rhodes Island, Greece

Funding Agency:


Contact IEEE to Subscribe

References

References is not available for this document.