Web caching and Zipf-like distributions: evidence and implications
Breslau, L.
Pei Cao
Li Fan
Phillips, G.
Shenker, S.
Xerox Palo Alto Res. Center, CA;
Abstract
This paper addresses two unresolved issues about Web caching. The
first issue is whether Web requests from a fixed user community are
distributed according to Zipf's (1929) law. The second issue relates to
a number of studies on the characteristics of Web proxy traces, which
have shown that the hit-ratios and temporal locality of the traces
exhibit certain asymptotic properties that are uniform across the
different sets of the traces. In particular, the question is whether
these properties are inherent to Web accesses or whether they are simply
an artifact of the traces. An answer to these unresolved issues will
facilitate both Web cache resource planning and cache hierarchy design.
We show that the answers to the two questions are related. We first
investigate the page request distribution seen by Web proxy caches using
traces from a variety of sources. We find that the distribution does not
follow Zipf's law precisely, but instead follows a Zipf-like
distribution with the exponent varying from trace to trace. Furthermore,
we find that there is only (i) a weak correlation between the access
frequency of a Web page and its size and (ii) a weak correlation between
access frequency and its rate of change. We then consider a simple model
where the Web accesses are independent and the reference probability of
the documents follows a Zipf-like distribution. We find that the model
yields asymptotic behaviour that are consistent with the experimental
observations, suggesting that the various observed properties of
hit-ratios and temporal locality are indeed inherent to Web accesses
observed by proxies. Finally, we revisit Web cache replacement
algorithms and show that the algorithm that is suggested by this simple
model performs best on real trace data. The results indicate that while
page requests do indeed reveal short-term correlations and other
structures, a simple model for an independent request stream following a
Zipf-like distribution is sufficient to capture certain asymptotic
properties observed at Web proxies
Index
Terms
Available to subscribers and IEEE members.
References
Available to subscribers and IEEE members.
Citing Documents
Available to subscribers and IEEE members.