Loading [a11y]/accessibility-menu.js
Dense Video Captioning using BiLSTM Encoder | IEEE Conference Publication | IEEE Xplore

Dense Video Captioning using BiLSTM Encoder


Abstract:

Video captioning has been a widely researched topic integrating visual information and natural language but performing video captioning on long untrimmed videos is still ...Show More

Abstract:

Video captioning has been a widely researched topic integrating visual information and natural language but performing video captioning on long untrimmed videos is still challenging as the video contains multiple events and the model has to describe each event. To address this issue, this paper discusses work on dense video captioning, a newly emerging research subject that entails presenting temporal events in a video and creating captions for each temporal event. Proposed architecture comprises an event proposal module, an EfficientNet B7 network for feature extraction from sampled frames, and BiLSTM encoder and LSTM decoder for captioning. BILSTM encoder effectively utilizes both past and future contexts from the video for generating captions. This model is trained and tested on MSVD dataset which has around 2000 videos and their corresponding captions. The proposed framework shows increased accuracy in video captioning in terms of BLEU score 0.78 and METEOR score 0.34.
Date of Conference: 27-29 May 2022
Date Added to IEEE Xplore: 15 July 2022
ISBN Information:
Conference Location: Belgaum, India

Contact IEEE to Subscribe

References

References is not available for this document.