MLLM-TA: Leveraging Multimodal Large Language Models for Precise Temporal Video Grounding

MLLM-TA: Leveraging Multimodal Large Language Models for Precise Temporal Video Grounding | IEEE Journals & Magazine | IEEE Xplore