Skip to Main Content
Load balancing in large distributed server systems is a complex optimization problem of critical importance in cloud systems and data centers. However, any full (i.e., optimal) solution incurs significant, often prohibitive, overhead due to the need to collect state-dependent information. We propose a novel scheme that incurs no communication overhead between the users and the servers upon job arrivals, thus removing any scheduling overhead from the job execution's critical path. Furthermore, our scheme is oblivious, that is, it does not use any state information. Our approach is based on creating, in addition to the regular job requests that are assigned to randomly chosen servers, also replicas that are sent to different servers; these replicas are served in low priority, such that they do not add any real burden on the servers. Through analysis and simulations we show that the expected system performance improves up to a factor of 2 (even under high load conditions), if job lengths are exponentially distributed, and over a factor of 5, when job lengths adhere to heavy-tailed distributions. We implemented a load balancing system based on our approach and deployed it on the Amazon Elastic Compute Cloud (EC2). Realistic load tests on that system indicate that the actual performance is as predicted.