Abstract:
Generic Event Boundary Detection (GEBD) [1] is a crucial task in video analysis, aiming to identify class-agnostic event boundaries. Traditional supervised or unsupervise...Show MoreMetadata
Abstract:
Generic Event Boundary Detection (GEBD) [1] is a crucial task in video analysis, aiming to identify class-agnostic event boundaries. Traditional supervised or unsupervised methods for GEBD rely on expensive data annotation and time-consuming training, often leading to limited generalization across diverse data distributions. In this paper, we introduce SAM-GEBD, a novel, zero-cost approach for GEBD in videos by leveraging the Segment Anything Model (SAM). While SAM has shown its impressive zero-shot capabilities across many domains and tasks, we repurposed it to address the challenge of GEBD. The proposed method involves two stages, a zero-cost method for computing temporal residual Self Similarity Matrix (SSM), and an algorithm for identifying event boundaries by decoding SSM. Our method exhibits superior performance, achieving an F1@0.05 score of 0.724 on the Kinetics-GEBD and 0.38 on TAPOS, surpassing the current state-of-the-art unsupervised techniques [2], [1]. Additionally, we assess SAM-GEBD’s individual components by integrating them with neural methods to demonstrate their versatility.
Published in: ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Date of Conference: 14-19 April 2024
Date Added to IEEE Xplore: 18 March 2024
ISBN Information: