Skip to Main Content
Data replication is used extensively in wide-area distributed systems to achieve low data-access latency. A large number of heuristics have been proposed to perform replica placement. Practical experience indicates that the choice of heuristic makes a big difference in terms of the cost of required infrastructure (e.g., storage capacity and network bandwidth), depending on system topology, workload and performance goals. We describe a method to assist system designers choose placement heuristics that meet their performance goals for the lowest possible infrastructure cost. Existing heuristics are classified according to a number of properties. The inherent cost (lower bound) for each class of heuristics is obtained for given system, workload and performance goals. The system designer compares different classes of heuristics on the basis of these lower bounds. Experimental results show that choosing a heuristic with the proposed methodology results in up to 7 times lower cost compared to using an "obvious " heuristic, such as caching.