Conferences >2018 IEEE International Confe...

An Adaptive Batch-Orchestration Algorithm for the Heterogeneous GPU Cluster Environment in Distributed Deep Learning System

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

Training deep learning model is time consuming, so various researches have been conducted on accelerating the training speed through distributed processing. Data parallel...Show More

Metadata

Abstract:

Training deep learning model is time consuming, so various researches have been conducted on accelerating the training speed through distributed processing. Data parallelism is one of the widely-used distributed training schemes, and various algorithms for the data parallelism have been studied. However, since most of studies assumed homogeneous computing environment, there is a problem that they do not consider a heterogeneous performance graphics processing unit (GPU) cluster environment. The heterogeneous performance environment leads to differences in computation time between GPU workers in the synchronous data parallelism. Due to the difference of the computation time of one iteration, the straggler problem that fast workers wait for the slowest worker makes training speed slow. Therefore, in this paper, we propose a batch-orchestration algorithm (BOA), reducing the training time by improving hardware efficiency in the heterogeneous performance GPU cluster. The proposed algorithm coordinates local mini-batch sizes for all workers to reduce the training iteration time. We confirmed that the proposed algorithm improves the performance by 23% over the synchronous SGD with one back-up worker when training ResNet-194 using 8 GPUs of three different types.

Published in: 2018 IEEE International Conference on Big Data and Smart Computing (BigComp)

Date of Conference: 15-17 January 2018

Date Added to IEEE Xplore: 28 May 2018

ISBN Information:

Electronic ISSN: 2375-9356

DOI: 10.1109/BigComp.2018.00136

Conference Location: Shanghai, China

Contents

References is not available for this document.

An Adaptive Batch-Orchestration Algorithm for the Heterogeneous GPU Cluster Environment in Distributed Deep Learning System

Abstract:

Metadata

Abstract:

References

IEEE Account

Purchase Details

Profile Information

Need Help?

An Adaptive Batch-Orchestration Algorithm for the Heterogeneous GPU Cluster Environment in Distributed Deep Learning System

Alerts

Abstract:

Metadata

Abstract:

Authors

Figures

References

Citations

Keywords

Metrics

References

IEEE Account

Purchase Details

Profile Information

Need Help?