Journals & Magazines >IEEE Transactions on Image Pr... >Volume: 34

Exploring Effective Factors for Improving Visual In-Context Learning

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

The In-Context Learning (ICL) is to understand a new task via a few demonstrations (aka. prompt) and predict new inputs without tuning the models. While it has been widel...Show More

Metadata

Abstract:

The In-Context Learning (ICL) is to understand a new task via a few demonstrations (aka. prompt) and predict new inputs without tuning the models. While it has been widely studied in NLP, it is still a relatively new area of research in computer vision. To reveal the factors influencing the performance of visual in-context learning, this paper shows that Prompt Selection and Prompt Fusion are two major factors that have a direct impact on the inference performance of visual in-context learning. Prompt selection is the process of selecting the most suitable prompt for query image. This is crucial because high-quality prompts assist large-scale visual models in rapidly and accurately comprehending new tasks. Prompt fusion involves combining prompts and query images to activate knowledge within large-scale visual models. However, altering the prompt fusion method significantly impacts its performance on new tasks. Based on these findings, we propose a simple framework prompt-SelF to improve visual in-context learning. Specifically, we first use the pixel-level retrieval method to select a suitable prompt, and then use different prompt fusion methods to activate diverse knowledge stored in the large-scale vision model, and finally, ensemble the prediction results obtained from different prompt fusion methods to obtain the final prediction results. We conducted extensive experiments on single-object segmentation and detection tasks to demonstrate the effectiveness of prompt-SelF. Remarkably, prompt-SelF has outperformed OSLSM method-based meta-learning in 1-shot segmentation for the first time. This indicated the great potential of visual in-context learning. The source code and models will be available at https://github.com/syp2ysy/prompt-SelF.

Published in: IEEE Transactions on Image Processing ( Volume: 34)

Page(s): 2147 - 2160

Date of Publication: 31 March 2025

ISSN Information:

PubMed ID: 40168207

DOI: 10.1109/TIP.2025.3554410

Funding Agency:

References is not available for this document.

Contents

I. Introduction

Benefiting from the large models and large-scale datasets in NLP, researchers realized that the large models [1], [2], [3], [4], [5] have a crucial emergent ablity, which is In-context Learning. The purpose of in-context learning is to assist the model in comprehending new tasks and making predictions for new examples based on provided prompt. Typically, the prompt is a concise, structured input that provides context for the task, such as a task description or an example of an input-label pair. As a well-known field in NLP [6], [7], [8], in-context learning has just started in the field of vision. Indeed, visual in-context learning is becoming increasingly important in computer vision, particularly with the rise of large-scale models. Although these models can achieve impressive results in many tasks [9], [10], they often require huge amounts of data and computation to train, making them impractical for many real-world applications. As such, visual in-context learning is becoming increasingly important for developing more efficient and accurate computer vision systems that can operate in real-world settings. However, these research is relatively limited, so we are concentrating on visual in-context learning and carrying out preliminary studies.

Select All

A. Radford, K. Narasimhan, T. Salimans, and I. Sutskever, “Improving language understanding by generative pre-training,” OpenAI, San Francisco, CA, USA, Tech. Rep., 2018.

Exploring Effective Factors for Improving Visual In-Context Learning

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

I. Introduction

Authors

Figures

References

Keywords

Metrics

References

IEEE Account

Purchase Details

Profile Information

Need Help?