Loading [MathJax]/extensions/MathMenu.js
Video Scene Searching using Modified Bi-Modal Transformer | IEEE Conference Publication | IEEE Xplore

Video Scene Searching using Modified Bi-Modal Transformer


Abstract:

Most of the times, it is difficult to navigate through the videos having a long-running duration like movies to find a certain scene. If a user wants to start the video f...Show More

Abstract:

Most of the times, it is difficult to navigate through the videos having a long-running duration like movies to find a certain scene. If a user wants to start the video from a particular scene, it becomes a tedious task to manually search that scene. For finding any scene in a video, we propose a novel algorithm that takes the textual input of the scene description and finds the scene which best matches the given description. This algorithm utilizes the concept of video-captioning. A modified version of the Bi-modal Transformer is used in which the video gets pre-processed to identify various potential scenes and generate the scene descriptions from those respective scenes. These descriptions are stored in a list of dictionaries with their timestamps. Later when the user inputs the description of the scene which needs to be found, that user-given input is then compared with all the model-generated descriptions to find the best match and thus finding the scene. We introduce a new evaluation metric to check the performance of the model to identify various scenes present in the video. Experiments show that the modified Bi-modal transformer generates scene descriptions with less inference time without compromising on the accuracy of predicted scenes.
Date of Conference: 01-03 July 2022
Date Added to IEEE Xplore: 29 August 2022
ISBN Information:

ISSN Information:

Conference Location: Mumbai, India

Contact IEEE to Subscribe

References

References is not available for this document.