Loading [MathJax]/extensions/MathMenu.js
Hybrid Vision Transformer and Convolutional Neural Network for Sports Video Classification | IEEE Conference Publication | IEEE Xplore

Hybrid Vision Transformer and Convolutional Neural Network for Sports Video Classification


Abstract:

Engaging in sports is essential for maintaining our mental and physical well being. Sports video libraries are expanding quickly, and as a result, automated classificatio...Show More

Abstract:

Engaging in sports is essential for maintaining our mental and physical well being. Sports video libraries are expanding quickly, and as a result, automated classification is becoming necessary for many purposes such as content-based recommendations, contextual advertising, and simple access and retrieval. Vision Transformers (ViT) usage contributes to research on vision transformers’ effectiveness in video classification. ViTs are known for their ability to capture long-range spatial dependencies in data. It efficiently extracts high level features, reducing the need for manual feature engineering. We can use ViTs and CNN model in parallel to create a hybrid model that can adapt to the specific requirements of sports video classification. ViTs can capture global context, while CNN model can excels at detailed local features, improving feature extraction. Combining predictions from both models can enhances classification accuracy. Diverse viewpoints and learning strategies from ViTs and CNN model improve classification in complex tasks. Parallel usage of ViTs and CNN model is a form of ensemble learning. Ensemble methods are known to produce more reliable and accurate predictions by combining multiple models, making them a suitable choice for video classification. Using ViTs and VGG16 in parallel allows us to explore the applicability of state of-the-art vision transformers in video classification tasks. It can contribute to the ongoing research on the effectiveness of vision transformers and their integration with other architectures. We can leverage the power of Vision Transformers (ViT) and CNN model to make the final classification decision.
Date of Conference: 23-25 November 2024
Date Added to IEEE Xplore: 16 January 2025
ISBN Information:
Conference Location: Guntur, India

Contact IEEE to Subscribe

References

References is not available for this document.