Skip to Main Content
Parallelization of big-data analytics services over a federation of heterogeneous clouds has been considered to improve performance. However, contrary to common intuition, there is an inherent tradeoff between the level of parallelism and the performance for big-data analytics principally because of a significant delay for big-data to get transferred over the network. The data transfer delay can be comparable or even higher than the time required to compute data. To address the aforementioned tradeoff, this paper determines: (a) how many and which computing nodes in federated clouds should be used for parallel execution of big-data analytics; (b) opportunistic apportioning of big-data to these computing nodes in a way to enable synchronized completion at best-effort performance; and (c) sequence of apportioned, different sizes of big-data chunks to be computed in each node so that transfer of a chunk is overlapped as much as possible with the computation of the previous chunk in the node. In this regard, Maximally Overlapped Bin-packing driven Bursting (MOBB) algorithm is proposed, which improve the performance by up to 60% against existing approaches.