Journals & Magazines >IEEE Transactions on Multimedia >Volume: 23

Spatial Pyramid Attention for Deep Convolutional Neural Networks

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

Attention mechanisms have shown great success in computer vision. However, the commonly used global average pooling in some implementations aggregates a three-dimensional...Show More

Metadata

Abstract:

Attention mechanisms have shown great success in computer vision. However, the commonly used global average pooling in some implementations aggregates a three-dimensional feature map to a one-dimensional attention map, leading a significant loss of structural information in the attention learning. In this article, we present a novel Spatial Pyramid Attention Network (SPANet), which exploits the structural information and channel relationships for better feature representation. SPANet enhances a base network by adding Spatial Pyramid Attention (SPA) blocks laterally. By rethinking the self-attention mechanism design, we further present three topology structures of attention path connection for our SPANet. They can be flexibly applied to various CNN architectures. SPANet is conceptually simple but practically powerful. It uses both structural regularization and structural information to achieve better learning capability. We have comprehensively evaluated the performance of SPANet on four benchmark datasets for different visual tasks. The experimental results show that SPANet significantly improves the recognition accuracy without adding much computation overhead. Using SPANet, we achieve an improvement of 1.6% top-1 classification accuracy on the ImageNet 2012 benchmark based on ResNet50, and SPANet outperforms SENet and other attention methods. SPANet also significantly improves the object detection performance by a clear margin with negligible additional computation overhead. When applying SPANet to RetinaNet based on the ResNet50 backbone, we improve the performance of the baseline model by 2.3 mAP and the enhanced model outperforms SENet and GCNet by 1.1 mAP and 1.7 mAP respectively. The code of SPANet is made publicly available.¹¹

[Online]. Available: https://github.com/13952522076/SPANet_TMM

Published in: IEEE Transactions on Multimedia ( Volume: 23)

Page(s): 3048 - 3058

Date of Publication: 24 March 2021

ISSN Information:

DOI: 10.1109/TMM.2021.3068576

Funding Agency:

Contents

I. Introduction

In The last few years, we have witnessed a flourish of convolutional neural networks (CNNs) in computer vision research and applications. To improve the performance of CNNs, recent works add more convolutional layers to the CNN architectures. For instance, from 8-layer AlexNet [1] to 1000-layer ResNet [2], [3], in order to achieve higher accuracy for image recognition. However, more learnable layers introduce more parameters and prolong inference time.

References is not available for this document.

Spatial Pyramid Attention for Deep Convolutional Neural Networks

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

I. Introduction

References

IEEE Account

Purchase Details

Profile Information

Need Help?

Spatial Pyramid Attention for Deep Convolutional Neural Networks

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

I. Introduction

Authors

Figures

References

Citations

Keywords

Metrics

Footnotes

References

IEEE Account

Purchase Details

Profile Information

Need Help?