Abstract:
Salient object detection (SOD) aims to identify the most prominent regions in images. However, the large model sizes, high computational costs, and slow inference speeds ...Show MoreMetadata
Abstract:
Salient object detection (SOD) aims to identify the most prominent regions in images. However, the large model sizes, high computational costs, and slow inference speeds of existing RGB-D SOD models have hindered their deployment on real-world embedded devices. To address this issue, we propose a novel method named AirSOD, which is committed to lightweight RGB-D SOD. Specifically, we first design a hybrid feature extraction network, which includes the first three stages of MobileNetV2 and our Parallel Attention-Shift convolution (PAS) module. Using the novel PAS module enables capturing both long-range dependencies and local information to enhance the representation learning while significantly reducing the number of parameters and computational complexity. Secondly, we propose a Multi-level and Multi-modal feature Fusion (MMF) module to facilitate feature fusion, and a Multi-path enhancement for Feature Refinement (MFR) decoder for feature integration. The proposed method significantly reduces the model size by 63%, decreases the computational complexity by 43%, and improves the inference speed by 43% compared with the cutting-edge model (MobileSal). We test our AirSOD on six widely-used RGB-D SOD datasets. Extensive experimental results demonstrate that our method obtains satisfactory performance. The source codes will be made available.
Published in: IEEE Transactions on Circuits and Systems for Video Technology ( Volume: 34, Issue: 3, March 2024)
Funding Agency:
Keywords assist with retrieval of results and provide a means to discovering other relevant content. Learn more.
- IEEE Keywords
- Index Terms
- Salient Object ,
- Salient Object Detection ,
- RGB-D Salient Object Detection ,
- Computational Complexity ,
- Local Information ,
- Model Size ,
- Representation Learning ,
- Feature Fusion ,
- Feature Extraction Network ,
- Long-range Dependencies ,
- Multi-level Features ,
- Inference Speed ,
- Multimodal Features ,
- Long-range Information ,
- Convolutional Layers ,
- Contextual Information ,
- Feature Maps ,
- Feature Learning ,
- Semantic Information ,
- Final Prediction ,
- RGB Features ,
- Depth Features ,
- Feature Map Channels ,
- Fewer Parameters ,
- High-level Features ,
- Low Computational Complexity ,
- Lightweight Model ,
- Depthwise Separable Convolution ,
- Global Context Information ,
- Rich Semantic Information
- Author Keywords
Keywords assist with retrieval of results and provide a means to discovering other relevant content. Learn more.
- IEEE Keywords
- Index Terms
- Salient Object ,
- Salient Object Detection ,
- RGB-D Salient Object Detection ,
- Computational Complexity ,
- Local Information ,
- Model Size ,
- Representation Learning ,
- Feature Fusion ,
- Feature Extraction Network ,
- Long-range Dependencies ,
- Multi-level Features ,
- Inference Speed ,
- Multimodal Features ,
- Long-range Information ,
- Convolutional Layers ,
- Contextual Information ,
- Feature Maps ,
- Feature Learning ,
- Semantic Information ,
- Final Prediction ,
- RGB Features ,
- Depth Features ,
- Feature Map Channels ,
- Fewer Parameters ,
- High-level Features ,
- Low Computational Complexity ,
- Lightweight Model ,
- Depthwise Separable Convolution ,
- Global Context Information ,
- Rich Semantic Information
- Author Keywords