Abstract:
Salient object detection (SOD) aims to identify the most prominent regions in images. However, the large model sizes, high computational costs, and slow inference speeds ...Show MoreMetadata
Abstract:
Salient object detection (SOD) aims to identify the most prominent regions in images. However, the large model sizes, high computational costs, and slow inference speeds of existing RGB-D SOD models have hindered their deployment on real-world embedded devices. To address this issue, we propose a novel method named AirSOD, which is committed to lightweight RGB-D SOD. Specifically, we first design a hybrid feature extraction network, which includes the first three stages of MobileNetV2 and our Parallel Attention-Shift convolution (PAS) module. Using the novel PAS module enables capturing both long-range dependencies and local information to enhance the representation learning while significantly reducing the number of parameters and computational complexity. Secondly, we propose a Multi-level and Multi-modal feature Fusion (MMF) module to facilitate feature fusion, and a Multi-path enhancement for Feature Refinement (MFR) decoder for feature integration. The proposed method significantly reduces the model size by 63%, decreases the computational complexity by 43%, and improves the inference speed by 43% compared with the cutting-edge model (MobileSal). We test our AirSOD on six widely-used RGB-D SOD datasets. Extensive experimental results demonstrate that our method obtains satisfactory performance. The source codes will be made available.
Published in: IEEE Transactions on Circuits and Systems for Video Technology ( Volume: 34, Issue: 3, March 2024)