Conferences >2023 IEEE International Confe...

How to Use Language Expert to Assist Inference for Visual Commonsense Reasoning

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

Visual Commonsense Reasoning (VCR) task requires Vision and Language Model (VLM) to capture cognitive level clues from the visual-language input and give the right answer...Show More

Metadata

Abstract:

Visual Commonsense Reasoning (VCR) task requires Vision and Language Model (VLM) to capture cognitive level clues from the visual-language input and give the right answers to questions and their rationales. Recently, although Pretrained Language Model (PLM) has been taken as a powerful in-domain knowledge base to the various tasks like image segmentation and visual question answering, PLM remains unexplored to generalize to the unseen multi-modal data in an out-domain way. In this paper, we explore how to use PLM to assist VLM for the challenging VCR task and propose a framework called Vision and Language Assisted with Expert Language Model (VLAELM). The VLAELM aims to employ a PLM with expert level of commonsense knowledge to assist reasoning, which is difficult for the VLM learning just from scarce multi-modal data. The experiments show that VLAELM achieves significant improvements against the strong baselines. Moreover, we validate credibility for language expert as knowledge base and measure application value between generalization and specialty in PLM.

Published in: 2023 IEEE International Conference on Data Mining Workshops (ICDMW)

Date of Conference: 01-04 December 2023

Date Added to IEEE Xplore: 06 February 2024

ISBN Information:

ISSN Information:

DOI: 10.1109/ICDMW60847.2023.00074

Conference Location: Shanghai, China

Funding Agency:

Contents

References is not available for this document.

How to Use Language Expert to Assist Inference for Visual Commonsense Reasoning

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

References

IEEE Account

Purchase Details

Profile Information

Need Help?

How to Use Language Expert to Assist Inference for Visual Commonsense Reasoning

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

Authors

Figures

References

Citations

Keywords

Metrics

References

IEEE Account

Purchase Details

Profile Information

Need Help?