I. Introduction
Video anomaly detection (VAD) is a critical task in video surveillance. Due to the unbounded and rare nature of anomalies, VAD is typically set as a semi-supervised task, where only normal events without specific labels are available in training data [1], [2]. Semi-supervised VAD has been studied for years, the long-standing goal of solving which is to train a one-class classifier that faithfully learns normal data distribution while avoiding undesired generalization on anomalies. To this end, in recent years, reconstruction-based [3], [4], [5] and prediction-based [6], [7], [8] deep learning methods spring up and make great strides.