Journals & Magazines >IEEE Transactions on Artifici... >Volume: 5 Issue: 6

Cross-Modal and Cross-Level Attention Interaction Network for Salient Object Detection

Download PDF
Download References
Request Permissions
Save to
Alerts

Impact Statement:RGB-D salient object detection (SOD) is a crucial preprocessing technique in various computer vision tasks such as image retrieval, image compression, and person search. ...Show More

Abstract:

Most existing RGB-D salient object detection methods utilize the convolutional neural networks (CNNs) to extract features. However, they fail to extract global informatio...Show More

Metadata

Impact Statement:

RGB-D salient object detection (SOD) is a crucial preprocessing technique in various computer vision tasks such as image retrieval, image compression, and person search. However, improving the detection accuracy has always been a challenging problem. In this article, we propose a CAINet which utilizes Swin Transformer as the backbone. It consists of four components: the CIEM is responsible for fuzing two-modals complementary information; the HCRM aims to accurately locate salient regions in the early stage of prediction; the MAID supplements information of different levels during the prediction process; and the EEM efficiently enhances the edge of saliency map. With these modules cooperating with each other, our CAINet achieves excellent performance. The encouraging results have increased the possibility of applying RGB-D SOD in artificial intelligence fields such as autonomous driving and medical imaging.

Abstract:

Most existing RGB-D salient object detection methods utilize the convolutional neural networks (CNNs) to extract features. However, they fail to extract global information due to the inherent defect of sliding window. On the other hand, with the emergence of depth clues, how to effectively incorporate cross-modal features has become an underlying challenge. In addition, in terms of cross-level feature fusion, most methods do not fully consider the complementarity between different layers and usually adopt simple fusion strategies, thereby leading to the missing of detailed information. To relieve these issues, a cross-modal and cross-level attention interaction network (CAINet) is proposed. First, different from most existing methods, we adopt a two-stream Swin Transformers to extract RGB and depth features. Second, a high-level context refinement module (HCRM) is designed to further extract refined features and give accurate guidance in early prediction stage. Third, we design a cross...

Published in: IEEE Transactions on Artificial Intelligence ( Volume: 5, Issue: 6, June 2024)

Page(s): 2907 - 2920

Date of Publication: 20 November 2023

Electronic ISSN: 2691-4581

DOI: 10.1109/TAI.2023.3333827

Funding Agency:

Contents

References is not available for this document.

Cross-Modal and Cross-Level Attention Interaction Network for Salient Object Detection

Abstract:

Metadata

Abstract:

Funding Agency:

References

IEEE Account

Purchase Details

Profile Information

Need Help?

Cross-Modal and Cross-Level Attention Interaction Network for Salient Object Detection

Alerts

Abstract:

Metadata

Abstract:

Funding Agency:

Authors

Figures

References

Citations

Keywords

Metrics

References

IEEE Account

Purchase Details

Profile Information

Need Help?