I. INTRODUCTION
Semantic segmentation [1], [2] is a crucial scene understanding task in computer and robotic vision, aiming to generate pixel-wise category prediction of an image. Most of the state-of-the-art (SoTA) methods focus on exploring the potential of convolutional neural networks (CNNs) and learning strategies [3], [4]. However, a hurdle of training these models is the lack of large-scale and high-quality annotated datasets, imposing much burden for real applications, e.g., autonomous driving [5]. Consequently, growing attention has been paid to deep semi-supervised learning (SSL) for semantic segmentation [6] using the labeled data and additional unlabeled data.