Journals & Magazines >IEEE Transactions on Circuits... >Early Access

MV-CLIP: Multi-View CLIP for Zero-shot 3D Shape Recognition

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

Large-scale pre-trained models have demonstrated impressive performance in vision and language tasks within open-world scenarios. Due to the lack of comparable pre-traine...Show More

Metadata

Abstract:

Large-scale pre-trained models have demonstrated impressive performance in vision and language tasks within open-world scenarios. Due to the lack of comparable pre-trained models for 3D shapes, recent methods utilize language-image pre-training to realize zero-shot 3D shape recognition. However, due to the modality gap, pretrained language-image models are not confident enough in the generalization to 3D shape recognition. Consequently, this paper aims to improve the confidence with view selection and hierarchical prompts. Building on the well-established CLIP model, we introduce view selection in the vision side that minimizes entropy to identify the most informative views for 3D shape. On the textual side, hierarchical prompts combined of hand-crafted and GPT-generated prompts are proposed to refine predictions. The first layer prompts several classification candidates with traditional class-level descriptions, while the second layer refines the prediction based on function-level descriptions or further distinctions between the candidates. Extensive experiments demonstrate the effectiveness of the proposed modules for zero-shot 3D shape recognition. Remarkably, without the need for additional training, our proposed method achieves impressive zero-shot 3D classification accuracies of 84.44%, 91.51%, and 66.17% on ModelNet40, ModelNet10, and ShapeNet Core55, respectively. Furthermore, we will make the code publicly available to facilitate reproducibility and further research in this area.

Published in: IEEE Transactions on Circuits and Systems for Video Technology ( Early Access )

Page(s): 1 - 1

Date of Publication: 13 March 2025

ISSN Information:

DOI: 10.1109/TCSVT.2025.3551084

Funding Agency:

Keywords assist with retrieval of results and provide a means to discovering other relevant content. Learn more.

IEEE Keywords
- Three-dimensional displays ,
- Shape ,
- Solid modeling ,
- Visualization ,
- Semantics ,
- Image recognition ,
- Training ,
- Point cloud compression ,
- Computational modeling ,
- Accuracy
Index Terms
Author Keywords

Keywords assist with retrieval of results and provide a means to discovering other relevant content. Learn more.

IEEE Keywords
- Three-dimensional displays ,
- Shape ,
- Solid modeling ,
- Visualization ,
- Semantics ,
- Image recognition ,
- Training ,
- Point cloud compression ,
- Computational modeling ,
- Accuracy
Index Terms
Author Keywords

MV-CLIP: Multi-View CLIP for Zero-shot 3D Shape Recognition

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

IEEE Account

Purchase Details

Profile Information

Need Help?

MV-CLIP: Multi-View CLIP for Zero-shot 3D Shape Recognition

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

Authors

Keywords

Metrics

IEEE Account

Purchase Details

Profile Information

Need Help?