Journals & Magazines >IEEE Signal Processing Letters >Volume: 31

Spatial-Enhanced Multi-Level Wavelet Patching in Vision Transformers

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

By seamlessly integrating wavelet transforms into the image patching stage of ViT, we leverage the power of multi-level wavelet transforms to decompose images into a dive...Show More

Metadata

Abstract:

By seamlessly integrating wavelet transforms into the image patching stage of ViT, we leverage the power of multi-level wavelet transforms to decompose images into a diverse array of frequency-domain features. These features, integrated with spatial characteristics at equivalent scales, enrich image details, enhancing ViT's proficiency in delineating intricate textures and distinct edges. Consequently, we registered a notable 2.7% accuracy enhancement on the ImageNet100 dataset in ViT. Our wavelet patching module, designed for versatility, seamlessly fits into various ViT derivatives without necessitating architecture modifications. This advancement has uplifted the performance of several leading vision transformers by 0.46–4.3%, preserving parameter efficiency without notable FLOPs increment.

Published in: IEEE Signal Processing Letters ( Volume: 31)

Page(s): 446 - 450

Date of Publication: 12 January 2024

ISSN Information:

DOI: 10.1109/LSP.2024.3350811

Funding Agency:

Contents

I. Introduction

The Vision Transformer (ViT) [1] tokenizes images into fixed-size patches, employing Transformer layers akin to language models to determine inter-token relationships for image classification. However, this method often overlooks the vital local nuances [2], [3] within each patch, notably textures [4], edges [5], and lines, requiring larger training datasets to match CNN benchmarks [6]. In signal processing, techniques like the discrete wavelet transform (DWT) can distinguish such features across varied frequency bands and efficiently spotlight these obscured local features. Nevertheless, many ViT variants sidestep patch-processing enhancements.

References is not available for this document.

Spatial-Enhanced Multi-Level Wavelet Patching in Vision Transformers

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

I. Introduction

References

IEEE Account

Purchase Details

Profile Information

Need Help?

Spatial-Enhanced Multi-Level Wavelet Patching in Vision Transformers

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

I. Introduction

Authors

Figures

References

Keywords

Metrics

Supplemental Items

References

IEEE Account

Purchase Details

Profile Information

Need Help?