Skip to Main Content
Content caching and location are key enabling technologies for achieving the high throughput needed to sustain current Internet infrastructure, both for peer-to-peer as well as client-server applications. An important aspect of distributed caching techniques is the mapping of data and requests to maximize system throughput while minimizing costs in the presence of network and cache failures. We describe a new cache protocol based on consistent hashing (CH) [D. Karger et al., (1997), (1999)]. Compared to consistent hashing, our protocol, called extended consistent hashing (ECH), can handle flash access to objects significantly better and yields better worst-case response times and lower load variance. Due to multiplicity of client views in a distributed hashing scheme, a single object (or its reference) may be cached at multiple locations. This is referred to as the spread of an object. Consistent hashing maps a request to a cache irrespective of the spread of the requested object. ECH, on the other hand, estimates the spread of an object and randomizes requests over expected spread. In doing so, it amortizes requests over a larger number of caches. While the expected load on target caches in ECH remains the same as consistent hashing (asymptotically optimal), load variance is significantly reduced. We present analytical results as well as simulations to demonstrate significant improvements for querying frequently accessed objects, up to 80% in worst-case response time and 30% in variance of server/target cache loads. We also show excellent correlation between expected and observed results. What makes ECH particularly attractive is that it can be integrated into existing infrastructure based on consistent hashing with minimal software overhead.