Loading [a11y]/accessibility-menu.js
Software-Defined Data Shuffling for Big Data Jobs with Task Duplication | IEEE Conference Publication | IEEE Xplore

Software-Defined Data Shuffling for Big Data Jobs with Task Duplication


Abstract:

Big data jobs are usually executed on large-scale distributed computing platforms that automatically divide a job into multiple computation phases, each of which contains...Show More

Abstract:

Big data jobs are usually executed on large-scale distributed computing platforms that automatically divide a job into multiple computation phases, each of which contains a number of independent tasks that can run in parallel. The data shuffling process between two consecutive phases becomes the bottleneck of job execution. To improve its performance, an approach of "push" shuffling is proposed to send intermediate results to next phase immediately once they are generated. It avoids local disk accesses in the traditional "pull" shuffling approach, and tasks in the next phase can start data processing without waiting tasks in the predecessive phase to finish. Task duplication is another approach to accelerate task execution by launching multiple task copies that compete for processing the same data block. When "push" shuffling meets task duplication, big data jobs can be significantly accelerated, but leading to a large amount of redundant data transmission between two phases. To address this challenge, we propose a software-define data shuffling approach by designing a controller and a janitor module to control the data shuffling process. Each task has a janitor that communicates with the controller to request admission permit of sending intermediate results to next-stage tasks. We further propose an online grouping algorithm to reduce the overhead of frequent communication with the controller. The performance of the proposed algorithm is evaluated by extensive simulations.
Date of Conference: 16-19 August 2016
Date Added to IEEE Xplore: 26 September 2016
ISBN Information:
Electronic ISSN: 2332-5690
Conference Location: Philadelphia, PA, USA

Contact IEEE to Subscribe

References

References is not available for this document.