Loading [MathJax]/extensions/MathMenu.js
How Useful is Communication Scheduling for Distributed Training? | IEEE Conference Publication | IEEE Xplore

How Useful is Communication Scheduling for Distributed Training?


Abstract:

Recently, there is a resurgence of packet scheduling ideas in the form of communication scheduling in the application layer for distributed training. Given recent results...Show More

Abstract:

Recently, there is a resurgence of packet scheduling ideas in the form of communication scheduling in the application layer for distributed training. Given recent results on potentially huge improvements, it is critical to properly interpret these results and understand how far we can go.We take a first-principles approach to analyzing and understanding the role of communication scheduling in distributed training. We formulate a mathematical model to represent the computation and communication pattern, and prove that the upper bound of improvements with communication scheduling is 3× for widely-used distributed training architectures.More importantly, we establish a quantitative relationship between the benefit of communication scheduling and the computation-to-communication ratio. While the exact curve for each model varies, we demonstrate that all models have the same shape—concave. Surprisingly, contrary to the common belief, for varying models and hardware configurations, we find that communication scheduling can offer only limited improvements in addition to overlapping. Our results raise the question about the necessity of overloading parameter transmission with application-layer semantics. Additionally, we provide both theoretical analysis and empirical studies to show that most improvements can be obtained with well-understood network-layer methods without having to obtain the application-layer knowledge.
Date of Conference: 29-31 October 2024
Date Added to IEEE Xplore: 02 December 2024
ISBN Information:
Conference Location: Moscow, Russian Federation

Contact IEEE to Subscribe

References

References is not available for this document.