By Topic


Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$33 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

4 Author(s)
Jiaran Zhang ; School of Computer Science & Technology, Shandong University, Jinan, China ; Xiaohui Yu ; You Li ; Liwei Lin

Cloud storage has become increasingly popular due to its convenience, cost-effectiveness and scalability. It provides the basis for a slate of file hosting services, which offer users the ability to synchronize their files between the servers and their devices. Naive file synchronization, however, requires the whole file to be transmitted to all other locations (servers, devices) whenever the file is updated in one location. This leads to massive waste of bandwidth and significant delays in propagating the update. We propose a method called HadoopRsync, which is capable of performing incremental update of files instead of transmitting them in entirety. This method is based on the rsync utility originally proposed for file synchronization between computers, but the scenario under consideration is significantly different from that for rsync in that in the cloud storage context, files are distributedly stored at multiple nodes in the cloud. We therefore propose a pair of algorithms called HadoopRsync Upload and HadoopRsync Download, which are responsible for the synchronization from the user's devices to the cloud and the synchronization in the opposite direction respectively. These algorithms only transmit the differences between the new version of the file and the old version, rather than the whole file. Our solution is based on Hadoop, the open-source framework for distributed processing of very large data across clusters of computers. The algorithms utilize the MapReduce facility provided by Hadoop to fully taking advantage of its massive-parallelization capability. In addition, we propose some optimization measures to reduce the I/Os required for file update. Extensive experiments are conducted to evaluate the proposed solution, which show that HadoopRsync significantly outperforms the baseline methods.

Published in:

Cloud and Service Computing (CSC), 2011 International Conference on

Date of Conference:

12-14 Dec. 2011