Conferences >2023 IEEE International Sympo...

AMPeD: An Analytical Model for Performance in Distributed Training of Transformers

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

Transformers are a class of machine learning models that have piqued high interest recently due to a multitude of reasons. They can process multiple modalities efficientl...Show More

Metadata

Abstract:

Transformers are a class of machine learning models that have piqued high interest recently due to a multitude of reasons. They can process multiple modalities efficiently and have excellent scalability. Despite these obvious advantages, training these large models is very time-consuming. Hence, there have been efforts to speed up the training process using efficient distributed implementations. Many different types of parallelism have been identified that can be employed standalone or in combination. However, naively combining different parallelization schemes can incur significant communication overheads, thereby potentially defeating the purpose of distributed training. Thus, it becomes vital to predict the right mapping of different parallelisms to the underlying system architecture. In this work, we propose AMPeD, an analytical model for performance in distributed training of transformers. It exposes all the transformer model parameters, potential parallelism choices (along with their mapping onto the system), the accelerator as well as system architecture specffications as tunable knobs, thereby enabling hardware-software co-design. With the help of 3 case studies, we show that the combinations of parallelisms predicted to be efficient by AMPeD conform with the results from the state-of-the-art literature. Using AMPeD, we also show that future distributed systems consisting of optical communication substrates can train large models up to 4× faster as compared to the current state-of the-art systems without modifying the peak computational power of the accelerators. Finally, we validate AMPeD with in-house experiments on real systems and via published literature. The max. observed error is limited to 12%. The model is available here: https://github.com/CSA-infra/AMPeD

Published in: 2023 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)

Date of Conference: 23-25 April 2023

Date Added to IEEE Xplore: 23 June 2023

ISBN Information:

DOI: 10.1109/ISPASS57527.2023.00037

Conference Location: Raleigh, NC, USA

Contents

I. Introduction

Deep learning algorithms are the basic building blocks of any AI application. An estimate of their growing importance can be gauged from their growing market value. The deep learning market was valued at 34.8 billion USD in 2021 and is predicted to be valued at 526.7 billion USD in 2030 growing at a CAGR of 34.3% [1]. In recent times, there have been particularly significant efforts to characterize [2] and accelerate the training of large language models such as transformers. The main reasons are the impressive scalability and accuracy achieved by transformer models on real-world tasks such as neural machine translation [3], sentiment analysis [4], automatic speech recognition [5], text classification [6], question answering, and visual object recognition [7].

References is not available for this document.

AMPeD: An Analytical Model for Performance in Distributed Training of Transformers

Abstract:

Metadata

Abstract:

I. Introduction

References

IEEE Account

Purchase Details

Profile Information

Need Help?

AMPeD: An Analytical Model for Performance in Distributed Training of Transformers

Alerts

Abstract:

Metadata

Abstract:

I. Introduction

Authors

Figures

References

Citations

Keywords

Metrics

References

IEEE Account

Purchase Details

Profile Information

Need Help?