Loading [MathJax]/extensions/MathMenu.js
Network Traffic Optimization in Hadoop MapReduce through Pre-shuffling | IEEE Conference Publication | IEEE Xplore

Network Traffic Optimization in Hadoop MapReduce through Pre-shuffling


Abstract:

MapReduce is a popular programming model used for handling Big Data in a Distributed computing Environment. Hadoop is popularly used for short jobs and desires a comparat...Show More

Abstract:

MapReduce is a popular programming model used for handling Big Data in a Distributed computing Environment. Hadoop is popularly used for short jobs and desires a comparatively low response time. During shuffling phase, large volume of data generated by the map task needs to be accessed to perform shuffling and sorted before being transferred across the network to the reducers. It may generate huge network traffic and consume high bandwidth. Hence a Pre-shuffling scheme is proposed to shuffle intensive applications in this paper, in order to minimize the network traffic because network connects is likely to become a limited resource when many applications subscribe the cluster. Here a push model and two stage pipeline is used for the shuffling phase of Hadoop MapReduce. Experiments have also been conducted based on the web server log file of NASA and rnsit.ac.in. Experimental results show that the proposed Pre-shuffling scheme substantially reduces the network traffic and it remains slightly faster, when compared to the conventional Hadoop. The proposed push model in the pre-shuffling scheme with 2-stage pipeline reduces the run time by an average of 17.3% for click count application.
Date of Conference: 17-19 July 2019
Date Added to IEEE Xplore: 20 February 2020
ISBN Information:
Conference Location: Coimbatore, India

Contact IEEE to Subscribe

References

References is not available for this document.