Loading [a11y]/accessibility-menu.js
Scene-Dependent Prediction in Latent Space for Video Anomaly Detection and Anticipation | IEEE Journals & Magazine | IEEE Xplore

Scene-Dependent Prediction in Latent Space for Video Anomaly Detection and Anticipation


Abstract:

Video anomaly detection (VAD) plays a crucial role in intelligent surveillance. However, an essential type of anomaly named scene-dependent anomaly is overlooked. Moreove...Show More

Abstract:

Video anomaly detection (VAD) plays a crucial role in intelligent surveillance. However, an essential type of anomaly named scene-dependent anomaly is overlooked. Moreover, the task of video anomaly anticipation (VAA) also deserves attention. To fill these gaps, we build a comprehensive dataset named NWPU Campus, which is the largest semi-supervised VAD dataset and the first dataset for scene-dependent VAD and VAA. Meanwhile, we introduce a novel forward-backward framework for scene-dependent VAD and VAA, in which the forward network individually solves the VAD and jointly solves the VAA with the backward network. Particularly, we propose a scene-dependent generative model in latent space for the forward and backward networks. First, we propose a hierarchical variational auto-encoder to extract scene-generic features. Next, we design a score-based diffusion model in latent space to refine these features more compact for the task and generate scene-dependent features with a scene information auto-encoder, modeling the relationships between video events and scenes. Finally, we develop a temporal loss from key frames to constrain the motion consistency of video clips. Extensive experiments demonstrate that our method can handle both scene-dependent anomaly detection and anticipation well, achieving state-of-the-art performance on ShanghaiTech, CUHK Avenue, and the proposed NWPU Campus datasets.
Page(s): 224 - 239
Date of Publication: 16 September 2024

ISSN Information:

PubMed ID: 39283792

Funding Agency:


I. Introduction

Video anomaly detection (VAD) is a critical task in video surveillance. Due to the unbounded and rare nature of anomalies, VAD is typically set as a semi-supervised task, where only normal events without specific labels are available in training data [1], [2]. Semi-supervised VAD has been studied for years, the long-standing goal of solving which is to train a one-class classifier that faithfully learns normal data distribution while avoiding undesired generalization on anomalies. To this end, in recent years, reconstruction-based [3], [4], [5] and prediction-based [6], [7], [8] deep learning methods spring up and make great strides.

Contact IEEE to Subscribe

References

References is not available for this document.