Abstract:
Due to the bulkiness and sophistication of the Distributed Deep Learning (DDL) systems, it leaves an enormous challenge for AI researchers and operation engineers to anal...Show MoreMetadata
Abstract:
Due to the bulkiness and sophistication of the Distributed Deep Learning (DDL) systems, it leaves an enormous challenge for AI researchers and operation engineers to analyze, diagnose and locate the performance bottleneck during the training stage. Existing performance models and frameworks gain little insight on the performance reduction that a performance straggler induces. In this paper, we introduce MD-Roofline, a training performance analysis model, which extends the traditional rooftine model with communication dimension. The model considers the layer-wise attributes at application level, and a series of achievable peak performance metrics at hardware level. With the assistance of our MD-Roofline, the AI researchers and DDL operation engineers could locate the system bottleneck, which contains three dimensions: intra-GPU computation capacity, intra-GPU memory access bandwidth and inter-GPU communication bandwidth. We demonstrate that our performance analysis model provides great insights in bottleneck analysis when training 12 classic CNNs.
Date of Conference: 30 June 2022 - 03 July 2022
Date Added to IEEE Xplore: 19 October 2022
ISBN Information:
ISSN Information:
Funding Agency:
Keywords assist with retrieval of results and provide a means to discovering other relevant content. Learn more.
- IEEE Keywords
- Index Terms
- Deep Learning ,
- Training Performance ,
- Performance Analysis Model ,
- Distributed Deep Learning ,
- Performance Metrics ,
- Training Stage ,
- Peak Performance ,
- Communication Bandwidth ,
- Dimensions Of Communication ,
- Bottleneck Analysis ,
- Throughput ,
- Convolutional Layers ,
- Feature Maps ,
- Parallel Data ,
- Forward Pass ,
- Floating-point Operations ,
- Hardware Resources ,
- Communication Capacity ,
- Backward Pass ,
- Bottleneck Layer ,
- Memory Bandwidth ,
- DNN Model ,
- Ring Topology ,
- Synchronization Mechanism
- Author Keywords
Keywords assist with retrieval of results and provide a means to discovering other relevant content. Learn more.
- IEEE Keywords
- Index Terms
- Deep Learning ,
- Training Performance ,
- Performance Analysis Model ,
- Distributed Deep Learning ,
- Performance Metrics ,
- Training Stage ,
- Peak Performance ,
- Communication Bandwidth ,
- Dimensions Of Communication ,
- Bottleneck Analysis ,
- Throughput ,
- Convolutional Layers ,
- Feature Maps ,
- Parallel Data ,
- Forward Pass ,
- Floating-point Operations ,
- Hardware Resources ,
- Communication Capacity ,
- Backward Pass ,
- Bottleneck Layer ,
- Memory Bandwidth ,
- DNN Model ,
- Ring Topology ,
- Synchronization Mechanism
- Author Keywords