Abstract:
Incorporating depth information into RGB images has proven its effectiveness in semantic segmentation. The multi-modal feature fusion, which integrates depth and RGB feat...Show MoreMetadata
Abstract:
Incorporating depth information into RGB images has proven its effectiveness in semantic segmentation. The multi-modal feature fusion, which integrates depth and RGB features, is a crucial component determining segmentation accuracy. Most existing multi-modal feature fusion schemes enhance multi-modal features via channel-wise attention modules which leverage global context information. In this work, we propose a novel pyramid-context guided fusion (PCGF) module to fully exploit the complementary information from the depth and RGB features. The proposed PCGF utilizes both local and global contexts inside the attention module to provide effective guidance for fusing cross-modal features of inconsistent semantics. Moreover, we introduce a lightweight yet practical multi-level general fusion module to combine the features at multiple levels of abstraction to enable high-resolution prediction. Utilizing the proposed feature fusion modules, our Pyramid-Context Guided Network (PCGNet) can learn discriminative features by taking full advantage of multi-modal and multi-level information. Our comprehensive experiments demonstrate that the proposed PCGNet achieves state-of-the-art performance on two benchmark datasets NYUDv2 and SUN-RGBD.
Date of Conference: 18-22 July 2022
Date Added to IEEE Xplore: 23 August 2022
ISBN Information:
Keywords assist with retrieval of results and provide a means to discovering other relevant content. Learn more.
- IEEE Keywords
- Index Terms
- Semantic Segmentation ,
- Feature Fusion ,
- RGB-D Semantic Segmentation ,
- Contextual Information ,
- Attention Module ,
- Depth Information ,
- Multimodal Information ,
- Depth Features ,
- Multimodal Features ,
- Global Context Information ,
- Channel-wise Attention ,
- RGB Features ,
- Dimensionality Reduction ,
- Hidden Layer ,
- Feature Maps ,
- Multilayer Perceptron ,
- Stochastic Gradient Descent ,
- Deep Convolutional Neural Network ,
- High-level Features ,
- Important Structural Information ,
- Low-level Features ,
- Attention Weights ,
- Embedding Vectors ,
- Channel Attention Module ,
- Element-wise Summation ,
- Multi-level Features ,
- Decoder Block ,
- Encoding Stage ,
- Channel Attention
Keywords assist with retrieval of results and provide a means to discovering other relevant content. Learn more.
- IEEE Keywords
- Index Terms
- Semantic Segmentation ,
- Feature Fusion ,
- RGB-D Semantic Segmentation ,
- Contextual Information ,
- Attention Module ,
- Depth Information ,
- Multimodal Information ,
- Depth Features ,
- Multimodal Features ,
- Global Context Information ,
- Channel-wise Attention ,
- RGB Features ,
- Dimensionality Reduction ,
- Hidden Layer ,
- Feature Maps ,
- Multilayer Perceptron ,
- Stochastic Gradient Descent ,
- Deep Convolutional Neural Network ,
- High-level Features ,
- Important Structural Information ,
- Low-level Features ,
- Attention Weights ,
- Embedding Vectors ,
- Channel Attention Module ,
- Element-wise Summation ,
- Multi-level Features ,
- Decoder Block ,
- Encoding Stage ,
- Channel Attention