0 seconds of 0 secondsVolume 90%
Press shift question mark to access a list of keyboard shortcuts
Keyboard Shortcuts
Play/PauseSPACE
Increase Volume↑
Decrease Volume↓
Seek Forward→
Seek Backward←
Captions On/Offc
Fullscreen/Exit Fullscreenf
Mute/Unmutem
Seek %0-9
Live
00:00
00:00
00:00
The Soccer-CLIP framework integrates video and textual data by segmenting video sequences into patch representations for robust feature extraction, enhanced through tempo...
Abstract:
In the rapidly advancing field of computer vision, the application of multimodal models—specifically, vision-language frameworks—has shown substantial promise for complex...Show MoreMetadata
Abstract:
In the rapidly advancing field of computer vision, the application of multimodal models—specifically, vision-language frameworks—has shown substantial promise for complex tasks such as video-based action spotting. This paper introduces Soccer-CLIP, a vision-language model specially designed for soccer action spotting. Soccer-CLIP incorporates an innovative domain-specific prompt engineering strategy, leveraging large language models (LLMs) to refine textual representations for precise alignment with soccer-specific actions. Our model integrates both visual and textual features to enhance recognition accuracy of critical soccer events. With the temporal augmentation techniques devised for input videos, Soccer-CLIP builds upon existing methodologies to address the inherent challenges of temporally sparse event annotations within video sequences. Evaluations on the SoccerNet Action Spotting benchmark demonstrate that Soccer-CLIP outperforms previous state-of-the-art models, exploring the effectiveness of our model’s capacity to capture domain-specific contextual nuances. This work represents a significant advancement in automated sports analysis, providing a robust and adaptable framework for broader applications in video recognition and temporal action localization tasks.
0 seconds of 0 secondsVolume 90%
Press shift question mark to access a list of keyboard shortcuts
Keyboard Shortcuts
Play/PauseSPACE
Increase Volume↑
Decrease Volume↓
Seek Forward→
Seek Backward←
Captions On/Offc
Fullscreen/Exit Fullscreenf
Mute/Unmutem
Seek %0-9
Live
00:00
00:00
00:00
The Soccer-CLIP framework integrates video and textual data by segmenting video sequences into patch representations for robust feature extraction, enhanced through tempo...
Published in: IEEE Access ( Volume: 13)