Abstract:
Semantic segmentation is a crucial component of autonomous driving. However, the segmentation performance in unstructured roads is challenging owing to the following reas...Show MoreMetadata
Abstract:
Semantic segmentation is a crucial component of autonomous driving. However, the segmentation performance in unstructured roads is challenging owing to the following reasons: 1) irregular shapes and varying sizes of road boundaries; 2) low-contrast or blurred boundaries between the road and background; and 3) environmental factors such as changing light intensities and dust particles. To overcome these challenges, this study proposes SwinURNet, a transformer-convolutional neural network (CNN) architecture for real-time point cloud segmentation in unstructured scenarios. First, the point cloud is projected onto a range image via spherical projection. Then, a lightweight ResNet34-based network is designed for encoding abstract features. A nonsquare Swin transformer (NSST) is designed for decoding information and capturing high-resolution transverse features. A multidimensional information fusion module (MDIFM) is introduced to balance the semantic differences between feature maps in CNNs and attention maps in transformers. A multitask loss function (MTLF) comprising boundary, weighted cross-entropy, and Lovász–Softmax losses is introduced at the network end to guide network training. Experimental data from autonomous mining trucks in the Baiyuneboite mining area is used to evaluate the performance on unstructured roads. The proposed architecture is also applied to the public unstructured dataset RELLIS-3D and the large structured dataset SemanticKITTI. The experimental results show performance gains of 74.2%, 42.6%, and 61.6% in accuracy and 8–19 FPS inference speed with the proposed architecture, surpassing those of the compared methods.
Published in: IEEE Transactions on Instrumentation and Measurement ( Volume: 73)