Skip to Main Content
High-volume Web sites often use clusters of servers to support their architectures. A load balancer in front of such clusters directs requests to the various servers in a way that equalizes, as much as possible, the load placed on each. There are two basic approaches to scaling Web clusters: adding more servers of the same type (scaling out, or horizontally) or upgrading the capacity of the servers in the cluster (scaling up, or vertically). Although more detailed and complex models would be required to obtain more accurate results about such systems' behavior, simple queuing theory provides a reasonable abstraction level to shed some insight on which scaling approach to employ in various scenarios. Typical issues in Web cluster design include: whether to use a large number of low-capacity inexpensive servers or a small number of high-capacity costly servers to provide a given performance level; how many servers of a given type are required to provide a certain performance level at a given cost; and how many servers are needed to build a Web site with a given reliability. Using queuing theory, I examine the average response time, capacity, cost, and reliability tradeoffs involved in designing Web server clusters.