Conferences >ICASSP 2025 - 2025 IEEE Inter...

Exploring Triple Knowledge Cues for Zero-Shot Human-Object Interaction Detection

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

Current zero-shot human-object interaction detection methods often follow a two-phase pipeline, which uses a pre-trained detector to detect instances and then adopts CLIP...Show More

Metadata

Abstract:

Current zero-shot human-object interaction detection methods often follow a two-phase pipeline, which uses a pre-trained detector to detect instances and then adopts CLIP to perform interaction prediction. During the second phase, they either obtain pairwise representations by directly performing RoI-Align on CLIP features or designing additional queries and decoders to fuse CLIP features. However, CLIP visual features often lack fine-grained information, thus being detrimental to capturing complex HOI interactions. Besides, extra decoders might increase computation costs. Thus, we propose a triple knowledge cues exploration model without extra decoders to explore various knowledge guidance for improving CLIP representations. First, we incorporate position distribution and semantic priors to delineate a layout from the predicted boxes and inject semantics by using the CLIP text embeddings. Next, we explore object priors by leveraging predefined class names and the text encoder to obtain saliency maps for humans and objects. Then, we design three types of holistic tokens to capture diverse attribute cues for human, object, and interaction, respectively. The above cues are finally integrated into a vanilla two-stage CLIP-based baseline. The experimental results on HICO-DET demonstrate the effectiveness of our proposed model.

Published in: ICASSP 2025 - 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Date of Conference: 06-11 April 2025

Date Added to IEEE Xplore: 07 March 2025

ISBN Information:

ISSN Information:

DOI: 10.1109/ICASSP49660.2025.10889833

Conference Location: Hyderabad, India

Funding Agency:

Contents

References is not available for this document.

Exploring Triple Knowledge Cues for Zero-Shot Human-Object Interaction Detection

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

References

IEEE Account

Purchase Details

Profile Information

Need Help?

Exploring Triple Knowledge Cues for Zero-Shot Human-Object Interaction Detection

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

Authors

Figures

References

Keywords

Metrics

Footnotes

References

IEEE Account

Purchase Details

Profile Information

Need Help?