Loading [a11y]/accessibility-menu.js
Visual Prompt Tuning for Weakly Supervised Phrase Grounding | IEEE Conference Publication | IEEE Xplore

Visual Prompt Tuning for Weakly Supervised Phrase Grounding


Abstract:

Previous works on the task of weakly supervised phrase grounding (WSG) rely heavily on object detectors providing RoIs for the localization. However, such methods cannot ...Show More

Abstract:

Previous works on the task of weakly supervised phrase grounding (WSG) rely heavily on object detectors providing RoIs for the localization. However, such methods cannot be applied effectively to real-world scenarios largely because that the detectors are trained with limited categories. In this paper, we propose a refinement-based approach to WSG through fine-tuning a detector-free phrase grounding model with a visual prompt. This visual prompt is extracted from the text-related representations in CLIP. Furthermore, we combine the visual prompt with learnable features and then fine-tune the grounding network. Our experimental results significantly outperform state-of-the-art methods on the WSG task and shows the effectiveness of our method.
Date of Conference: 14-19 April 2024
Date Added to IEEE Xplore: 18 March 2024
ISBN Information:

ISSN Information:

Conference Location: Seoul, Korea, Republic of

Funding Agency:


Contact IEEE to Subscribe

References

References is not available for this document.