Skip to Main Content
This paper investigates the problem of scheduling discretely divisible applications in highly heterogeneous distributed platforms which deploy modern desktop systems with limited memory as computing nodes. We propose an algorithm for hierarchical load balancing at both inter- and intra-node platform levels which relies on realistic performance models of computation and communication resources. An iterative procedure, based on the proposed algorithm, is also presented for building accurate performance models during the application run-time. The presented approach was evaluated for a 2D FFT batch application executed on a distributed system with four CPU+GPU nodes. The experimental results show the advantages of using the proposed approach by outperforming the "optimal" implementation by at least 4 times on GPU devices.