1. Introduction
Medical image segmentation plays a pivotal role in disease diagnosis and quantitative assessment throughout the clinical workflow. Convolutional neural networks (CNNs) have emerged as pioneering approaches in this domain. The seminal U-Net [23], with its series of convolutional and down-sampling layers designed to gather contextual information through a symmetrical hierarchical architecture, has demonstrated remarkable segmentation capabilities. Despite their widespread use, CNNs struggle to effectively model long-range dependencies due to their reliance on stacks of convolutional blocks to increase receptive fields.