Conferences >2021 IEEE/ACM Workshop on Mem...

FreeLunch: Compression-based GPU Memory Management for Convolutional Neural Networks

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

Recently, there is a trend to develop deeper and wider Convolutional Neural Networks (CNNs) to improve task accuracy. Due to this reason, the GPU memory quickly becomes t...Show More

Metadata

Abstract:

Recently, there is a trend to develop deeper and wider Convolutional Neural Networks (CNNs) to improve task accuracy. Due to this reason, the GPU memory quickly becomes the performance bottleneck since its capacity cannot keep up with the increase of the memory requirement of CNN models. Existing solutions exploit techniques such as swapping and recomputation to accommodate the shortage of memory. However, they suffer from performance degradations due to either the limited CPU-GPU bandwidth or the significant recomputation cost. This paper proposes a compression-based technique called FreeLunch that actively compresses the intermediate data to reduce the memory footprint of training large CNN models. Based on our evaluation, FreeLunch has up to 35% less memory consumption and up to 70% better throughput than swapping and recomputation.

Published in: 2021 IEEE/ACM Workshop on Memory Centric High Performance Computing (MCHPC)

Date of Conference: 14-14 November 2021

Date Added to IEEE Xplore: 20 December 2021

ISBN Information:

DOI: 10.1109/MCHPC54807.2021.00007

Conference Location: St. Louis, MO, USA

Contents

I. Introduction

Recent years have seen increasing adoption of Convolutional Neural Networks (CNN) in AI applications due to their superior accuracy on many computer vision tasks. Developing deeper and wider CNNs is an effective approach to improving task accuracy. However, the development is hindered by the memory constraints of GPUs. Deeper and wider CNN models will consume more memory during the training. For example, ResNet152 [1] consumes around 18GB of memory for a batch size of only 32 while the mainstream type of GPUs in cloud platforms P100 has only 16GB memory. As GPU memory sizes grow at a slower rate than memory requirements of large CNNs, there is a strong need to design memory management techniques to support the training of large CNN models on a single GPU.

References is not available for this document.

FreeLunch: Compression-based GPU Memory Management for Convolutional Neural Networks

Abstract:

Metadata

Abstract:

I. Introduction

References

IEEE Account

Purchase Details

Profile Information

Need Help?

FreeLunch: Compression-based GPU Memory Management for Convolutional Neural Networks

Alerts

Abstract:

Metadata

Abstract:

I. Introduction

Authors

Figures

References

Citations

Keywords

Metrics

References

IEEE Account

Purchase Details

Profile Information

Need Help?