Loading [a11y]/accessibility-menu.js
Data Efficient Video Transformer for Violence Detection | IEEE Conference Publication | IEEE Xplore

Data Efficient Video Transformer for Violence Detection


Abstract:

In smart cities, violence event detection is critical to ensure city safety. Several studies have been done on this topic with a focus on 2d-Convolutional Neural Network ...Show More

Abstract:

In smart cities, violence event detection is critical to ensure city safety. Several studies have been done on this topic with a focus on 2d-Convolutional Neural Network (2d-CNN) to detect spatial features from each frame, followed by one of the Recurrent Neural Networks (RNN) variants as a temporal features learning method. On the other hand, the transformer network has achieved a great result in many areas. The bottleneck for transformers is the need for large data set to achieve good results. In this work, we propose a data-efficient video transformer (DeVTr) based on the transformer network as a Spatio-temporal learning method with a pre-trained 2d-Convolutional neural network (2d-CNN) as an embedding layer for the input data. The model has been trained and tested on the Real-life violence dataset (RLVS) and achieved an accuracy of 96.25%. A comparison of the result for the suggested method with previous techniques illustrated that the suggested method provides the best result among all the other studies for violence event detection.
Date of Conference: 17-18 July 2021
Date Added to IEEE Xplore: 10 September 2021
ISBN Information:
Conference Location: Purwokerto, Indonesia

Contact IEEE to Subscribe

References

References is not available for this document.