Journals & Magazines >IEEE Access >Volume: 13

Soccer-CLIP: Vision Language Model for Soccer Action Spotting

0 seconds of 0 secondsVolume 90%

00:00

The Soccer-CLIP framework integrates video and textual data by segmenting video sequences into patch representations for robust feature extraction, enhanced through tempo...

Abstract:

In the rapidly advancing field of computer vision, the application of multimodal models—specifically, vision-language frameworks—has shown substantial promise for complex...Show More

Metadata

Abstract:

In the rapidly advancing field of computer vision, the application of multimodal models—specifically, vision-language frameworks—has shown substantial promise for complex tasks such as video-based action spotting. This paper introduces Soccer-CLIP, a vision-language model specially designed for soccer action spotting. Soccer-CLIP incorporates an innovative domain-specific prompt engineering strategy, leveraging large language models (LLMs) to refine textual representations for precise alignment with soccer-specific actions. Our model integrates both visual and textual features to enhance recognition accuracy of critical soccer events. With the temporal augmentation techniques devised for input videos, Soccer-CLIP builds upon existing methodologies to address the inherent challenges of temporally sparse event annotations within video sequences. Evaluations on the SoccerNet Action Spotting benchmark demonstrate that Soccer-CLIP outperforms previous state-of-the-art models, exploring the effectiveness of our model’s capacity to capture domain-specific contextual nuances. This work represents a significant advancement in automated sports analysis, providing a robust and adaptable framework for broader applications in video recognition and temporal action localization tasks.

0 seconds of 0 secondsVolume 90%

00:00

The Soccer-CLIP framework integrates video and textual data by segmenting video sequences into patch representations for robust feature extraction, enhanced through tempo...

Published in: IEEE Access ( Volume: 13)

Page(s): 44354 - 44365

Date of Publication: 06 March 2025

Electronic ISSN: 2169-3536

DOI: 10.1109/ACCESS.2025.3549293

Funding Agency:

Yoonho Shin

LG UPlus, Seoul, South Korea

Yoonho Shin received the B.S. and M.S. degrees in electrical engineering from Korea University, Seoul, South Korea, in 2019. He is currently pursuing the Ph.D. degree with the LG AI Graduate School. Since 2019, he has been a member of the Vision AI Team, LG UPlus. His research interests include sports action recognition and video generation models.

Sanghoon Park

LG UPlus, Seoul, South Korea

Sanghoon Park received the Ph.D. degree from the Department of Information and Communication, Gwangju Institute of Science and Technology (GIST), in 2008. From 2008 to 2017, he was a Senior Researcher with Samsung Thales Company Ltd. Since 2017, he has been with LG Uplus as a Team Leader of vision AI technology. His research interests include computer vision, VLM, LMM, and image/video generation and its applications.

Youngsub Han

LG UPlus, Seoul, South Korea

Youngsub Han received the Ph.D. degree in information technology from Towson University, MD, USA, in 2017. Since 2017, he has been the Head of AI technologies with LG Uplus. His research interests include speech, computer vision, and natural language processing, including LLMs and its applications.

Byoung-Ki Jeon

LG UPlus, Seoul, South Korea

Byoung-Ki Jeon received the B.S. degree in electronic engineering from Kyungpook National University, South Korea, in 1997, and the M.S. and Ph.D. degrees in electrical and computer engineering from POSTECH, Pohang, South Korea, in 1999 and 2005, respectively. Since 2020, he has been the Head of data and AI technologies with LU Uplus. His research interests include data engineering, data science, computer vision, and natu...Show More

Soonyoung Lee

LG UPlus, Seoul, South Korea

LG AI Research, Seoul, South Korea

Soonyoung Lee received the B.S. degree in electrical engineering and the M.S. and Ph.D. degrees in electrical engineering and computer science from Seoul National University, in 2005, 2007, and 2012, respectively. From 2012 to 2020, he was with Samsung Electronics Company Ltd. Since 2021, he has been with LG AI Research as a Research Fellow. His research interests include computer vision and multimodal applications.

Byung Jun Kang

LG UPlus, Seoul, South Korea

LG AI Research, Seoul, South Korea

Byung Jun Kang received the B.S. degree from the Department of Software, Sangmyung University, South Korea, in 2004, and the M.S. and Ph.D. degrees from the Department of Computer Science, Sangmyung University, in 2006 and 2009, respectively. Since 2020, he has been a Research Fellow with LG AI Research. His research interests include computer vision, biometrics, and anomaly detection and its applications.