Abstract:
Scene parsing is an important and challenging task in computer vision, which assigns semantic labels to each pixel in the entire scene. Existing scene parsing methods onl...Show MoreMetadata
Abstract:
Scene parsing is an important and challenging task in computer vision, which assigns semantic labels to each pixel in the entire scene. Existing scene parsing methods only utilize pixel-wise annotation as the supervision of neural network, thus, some similar categories are easy to be misclassified in the complex scenes without the utilization of regional relation. To tackle these above challenging problems, a Regional Relation Network (RRNet) is proposed in this paper, which aims to boost the scene parsing performance by mining regional relation from pixel-wise annotation. Specifically, the pixel-wise annotation is divided into a lot of fixed regions, so that intra- and inter-regional relation are able to be extracted as the supervision of network. We firstly design an intra-regional relation module to predict category distribution in each fixed region, which is helpful for reducing the misclassification phenomenon in regions. Secondly, an inter-regional relation module is proposed to learn the relationships among each region in scene images. With the guideline of relation information extracted from the ground truth, the network is able to learn more discriminative relation representations. To validate our proposed model, we conduct experiments on three typical datasets, including NYU-depth-v2, PASCAL-Context and ADE20k. The achieved competitive results on all three datasets demonstrate the effectiveness of our method.
Published in: 2022 IEEE International Conference on Visual Communications and Image Processing (VCIP)
Date of Conference: 13-16 December 2022
Date Added to IEEE Xplore: 16 January 2023
ISBN Information: