By Topic

Cost-aware replication for dataflows

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

4 Author(s)
Castillo, C. ; IBM T. J. Watson Res. Center, Hawthorne, NY, USA ; Tantawi, A.N. ; Arroyo, D. ; Steinder, M.

In this work we are concerned with the cost associated with replicating intermediate data for dataflows in Cloud environments. This cost is attributed to the extra resources required to create and maintain the additional replicas for a given data set. Existing data-analytic platforms such as Hadoop provide for fault-tolerance guarantee by relying on aggressive replication of intermediate data. We argue that the decision to replicate along with the number of replicas should be a function of the resource usage and utility of the data in order to minimize the cost of reliability. Furthermore, the utility of the data is determined by the structure of the dataflow and the reliability of the system. We propose a replication technique, which takes into account resource usage, system reliability and the characteristic of the dataflow to decide what data to replicate and when to replicate. The replication decision is obtained by solving a constrained integer programming problem given information about the dataflow up to a decision point. In addition, we built a working prototype, CARDIO of our technique which shows through experimental evaluation using a real testbed that finds an optimal solution.

Published in:

Network Operations and Management Symposium (NOMS), 2012 IEEE

Date of Conference:

16-20 April 2012