Skip to Main Content
In this paper, we analyze restrictions of traditional models affecting the accuracy of analytical prediction of the execution time of collective communication operations. In particular, we show that the constant and variable contributions of processors and network are not fully separated in these models. Full separation of the contributions that have different nature and arise from different sources will lead to more intuitive and accurate models, but the parameters of such models cannot be estimated from only the point-to-point experiments, which are usually used for traditional models. We are making the point that all the traditional models are designed so that their parameters can be estimated from a set of point-to-point communication experiments. In this paper, we demonstrate that the more intuitive models allow for much more accurate analytical prediction of the execution time of collective communication operations on both homogeneous and heterogeneous clusters. We present in detail one such a point-to-point model and how it can be used for prediction of the execution time of scatter and gather. We describe a set of communication experiments sufficient for accurate estimation of its parameters, and we conclude with presentation of experimental results demonstrating that the model much more accurately predicts the execution time of collective operations than traditional models.