Loading [a11y]/accessibility-menu.js
Frequency-Domain Refinement of Vision Transformers for Robust Medical Image Segmentation Under Degradation | IEEE Conference Publication | IEEE Xplore

Frequency-Domain Refinement of Vision Transformers for Robust Medical Image Segmentation Under Degradation


Abstract:

Medical image segmentation is crucial for precise diagnosis, treatment planning, and disease monitoring in clinical settings. While convolutional neural networks (CNNs) h...Show More

Abstract:

Medical image segmentation is crucial for precise diagnosis, treatment planning, and disease monitoring in clinical settings. While convolutional neural networks (CNNs) have achieved remarkable success, they struggle with modeling long-range dependencies. Vision Transformers (ViTs) address this limitation by leveraging self-attention mechanisms to capture global contextual information. However, ViTs often fall short in local feature description, which is crucial for precise segmentation. To address this issue, we reformulate self-attention in the frequency domain to enhance both local and global feature representation. Our approach, the Enhanced Wave Vision Transformer (EW-ViT), incorporates wavelet decomposition within the self-attention block to adaptively refine feature representation in low and high-frequency components. We also introduce the Prompt-Guided High-Frequency Refiner (PGHFR) module to handle image degradation, which mainly affects high-frequency components. This module uses implicit prompts to encode degradation-specific information and adjust high-frequency representations accordingly. Additionally, we apply a contrastive learning strategy to maintain feature consistency and ensure robustness against noise, leading to state-of-the-art (SOTA) performance in medical image segmentation, especially under various conditions of degradation. Source code is available at GitHub.
Date of Conference: 26 February 2025 - 06 March 2025
Date Added to IEEE Xplore: 08 April 2025
ISBN Information:

ISSN Information:

Conference Location: Tucson, AZ, USA

Funding Agency:


1. Introduction

Medical image segmentation plays a pivotal role in disease diagnosis and quantitative assessment throughout the clinical workflow. Convolutional neural networks (CNNs) have emerged as pioneering approaches in this domain. The seminal U-Net [23], with its series of convolutional and down-sampling layers designed to gather contextual information through a symmetrical hierarchical architecture, has demonstrated remarkable segmentation capabilities. Despite their widespread use, CNNs struggle to effectively model long-range dependencies due to their reliance on stacks of convolutional blocks to increase receptive fields.

Contact IEEE to Subscribe

References

References is not available for this document.