Journals & Magazines >IEEE Transactions on Neural N... >Volume: 35 Issue: 10

Visualizing and Understanding Patch Interactions in Vision Transformer

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

Vision transformer (ViT) has become a leading tool in various computer vision tasks, owing to its unique self-attention mechanism that learns visual representations expli...Show More

Metadata

Abstract:

Vision transformer (ViT) has become a leading tool in various computer vision tasks, owing to its unique self-attention mechanism that learns visual representations explicitly through cross-patch information interactions. Despite having good success, the literature seldom explores the explainability of ViT, and there is no clear picture of how the attention mechanism with respect to the correlation across comprehensive patches will impact the performance and what is the further potential. In this work, we propose a novel explainable visualization approach to analyze and interpret the crucial attention interactions among patches for ViT. Specifically, we first introduce a quantification indicator to measure the impact of patch interaction and verify such quantification on attention window design and indiscriminative patches removal. Then, we exploit the effective responsive field of each patch in ViT and devise a window-free transformer (WinfT) architecture accordingly. Extensive experiments on ImageNet demonstrate that the exquisitely designed quantitative method is shown able to facilitate ViT model learning, leading the top-1 accuracy by 4.28% at most. More remarkably, the results on downstream fine-grained recognition tasks further validate the generalization of our proposal.

Published in: IEEE Transactions on Neural Networks and Learning Systems ( Volume: 35, Issue: 10, October 2024)

Page(s): 13671 - 13680

Date of Publication: 24 May 2023

ISSN Information:

PubMed ID: 37224360

DOI: 10.1109/TNNLS.2023.3270479

Funding Agency:

Contents

References is not available for this document.

Visualizing and Understanding Patch Interactions in Vision Transformer

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

References

IEEE Account

Purchase Details

Profile Information

Need Help?

Visualizing and Understanding Patch Interactions in Vision Transformer

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

Authors

Figures

References

Citations

Keywords

Metrics

Footnotes

References

IEEE Account

Purchase Details

Profile Information

Need Help?