Skip to Main Content
Planetary scale cloud computing requires hugely scalable infrastructures for compute, storage, network, and services that support programming models such as Map Reduce. There are many design choices that arise in the construction of cloud infrastructures. Examples include: scheduling policies for compute clusters, caching and replication policies for storage, and approaches to integrating bandwidth management with application requirements for quality of service (QoS). These design choices must be evaluated in terms of their impact on QoS considerations such as throughput, latency, and jitter as well as the consumption of power, compute, storage, and network bandwidth. The scale of cloud infrastructures typically makes it impractical or ineffective to do these evaluations using test systems, and, for the most part, it is too costly and time-consuming to evaluate designs by building and deploying multiple implementations. This talk discusses ways in which Google uses quantitative models to evaluate design decisions for cloud infrastructures. In some cases, we incorporate quantitative models into production systems to improve the quality of on-line decision making. Since the effectiveness of these quantitative model relies on the type and accuracy of workload characteristics, the talk also addresses workload characterization. The talk addresses a number of research challenges with employing performance models in practice.