MSGFormer: A DeepLabv3+ Like Semantically Masked and Pixel Contrast Transformer for MouseHole Segmentation | IEEE Journals & Magazine | IEEE Xplore

MSGFormer: A DeepLabv3+ Like Semantically Masked and Pixel Contrast Transformer for MouseHole Segmentation


Comparison between current transformer-based segmentation model(left) and ours(right). MSGFormer by incorporating an additional semantic layer within the encoder backbone...

Abstract:

In semantic segmentation, the efficient representation of multi-scale context is of paramount importance. Inspired by the remarkable performance of Vision Transformers (V...Show More

Abstract:

In semantic segmentation, the efficient representation of multi-scale context is of paramount importance. Inspired by the remarkable performance of Vision Transformers (ViT) in image classification, subsequent researchers have proposed some Semantic Segmentation ViTs, most of which have achieved impressive results. However, these models often struggle to effectively utilizing multi-scale context, disregarding intra-image semantic context, and neglecting the global context of training data, i.e., the semantic relationships among pixels across different images. In this paper, we introduce the Sliding Window Dilated Attention and combine it with the Spatial Pyramid Pooling (SPP) to form a novel decoder called Sliding window dilated attention spatial pyramid pooling(SwinASPP). By adjusting the sliding window dilation rates, this decoder is capable of capturing multi-scale contextual information from different granularities. Additionally, we propose the Semantic Attention Block, which integrates semantic attention operations into the encoder. And adopt our proposed supervised pixel-wise contrastive learning algorithm, we shift the current training strategy to inter-image for semantic segmentation. Our experiments demonstrate that these methods lead to performance improvements on the SanJiangYuan MouseHole dataset and Cityscapes.
Comparison between current transformer-based segmentation model(left) and ours(right). MSGFormer by incorporating an additional semantic layer within the encoder backbone...
Published in: IEEE Access ( Volume: 12)
Page(s): 33544 - 33554
Date of Publication: 01 March 2024
Electronic ISSN: 2169-3536

Funding Agency:


References

References is not available for this document.