Abstract:
Most existing semantic segmentation methods primarily employ supervised learning with discriminative models. Although these methods are straightforward, they overlook the...Show MoreMetadata
Abstract:
Most existing semantic segmentation methods primarily employ supervised learning with discriminative models. Although these methods are straightforward, they overlook the modeling of underlying data distributions. In this paper, we propose a novel medical image segmentation framework called Diff-SFCT based on Diffusion Model. We formulate semantic segmentation as a generative problem for segmentation masks, replacing the conventional pixel-wise discriminative learning with a latent prior learning process to produce more accurate segmentation results. Diff-SFCT employs a backbone network combining Convolutional Neural Network (CNN) and Transformer, and utilizes the local perception of CNN and the global information modeling capability of Transformer. In Diff-SFCT, we design a Semantic Encoder that effectively extracts fine-grained semantic features from real images. Meanwhile, we propose a novel Spatial-Frequency Cross Transformer (SFCT) framework, which can effectively model and interact the global features of the diffuse noise mask and the real semantic features, reducing the domain gap between the two and enhancing the model’s representational capacity. Additionally, to preserve spatial and frequency information in the diffusion model, we design a Spatial-Frequency Attention Module (SFAM) as part of the Convolutional Block. This module improves the model’s spatial and frequency perception abilities while incurring negligible computational overhead. Experimental results evince that our DiffSFCT substantially outperforms other segmentation methods, exhibiting remarkable performance across various medical image segmentation datasets.
Date of Conference: 05-08 December 2023
Date Added to IEEE Xplore: 18 January 2024
ISBN Information: