Conferences >2021 IEEE International Confe...

Data Efficient Video Transformer for Violence Detection

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

In smart cities, violence event detection is critical to ensure city safety. Several studies have been done on this topic with a focus on 2d-Convolutional Neural Network ...Show More

Metadata

Abstract:

In smart cities, violence event detection is critical to ensure city safety. Several studies have been done on this topic with a focus on 2d-Convolutional Neural Network (2d-CNN) to detect spatial features from each frame, followed by one of the Recurrent Neural Networks (RNN) variants as a temporal features learning method. On the other hand, the transformer network has achieved a great result in many areas. The bottleneck for transformers is the need for large data set to achieve good results. In this work, we propose a data-efficient video transformer (DeVTr) based on the transformer network as a Spatio-temporal learning method with a pre-trained 2d-Convolutional neural network (2d-CNN) as an embedding layer for the input data. The model has been trained and tested on the Real-life violence dataset (RLVS) and achieved an accuracy of 96.25%. A comparison of the result for the suggested method with previous techniques illustrated that the suggested method provides the best result among all the other studies for violence event detection.

Published in: 2021 IEEE International Conference on Communication, Networks and Satellite (COMNETSAT)

Date of Conference: 17-18 July 2021

Date Added to IEEE Xplore: 10 September 2021

ISBN Information:

DOI: 10.1109/COMNETSAT53002.2021.9530829

Conference Location: Purwokerto, Indonesia