Home  |   Login  |   Logout  |   Access Information  |   Alerts  |   Purchase History  |   Cart  |   Sitemap  |   Help   
 
Abstract
BROWSE SEARCH IEEE XPLORE GUIDE SUPPORT
arrow_leftView TOC
Email/Printer Friendly Format  
 

Clustering Web content for efficient replication
Yan Chen   Lili Qiu   Weiyu Chen   Luan Nguyen   Katz, R.H.  
California Univ., Berkeley, CA, USA;

This paper appears in: Network Protocols, 2002. Proceedings. 10th IEEE International Conference on
Publication Date: 12-15 Nov. 2002
On page(s): 165- 174
ISSN: 1092-1648
ISBN: 0-7695-1856-7
INSPEC Accession Number: 7664447
Current Version Published: 2003-02-25

Abstract
Recently, there has been an increasing deployment of content distribution networks (CDNs) that offer hosting services to Web content providers. We first compare uncooperative pulling of Web contents, used by commercial CDNs, with cooperative pushing. The latter can achieve user perceived performance comparable to the former scheme with only 4-5% of replication and update traffic. Therefore, we explore how to push content to CDN nodes efficiently. Using trace-driven simulation, we show that replicating content in units of URLs can yield 60-70% reduction in clients' latency, compared to replicating in units of Web sites. However, such a fine-grained replication is very expensive. We propose to replicate content in units of clusters, each containing objects which are likely to be requested by clients that are topologically close. We describe three clustering techniques, and use various topologies and several large Web server traces to evaluate their performance. Cluster-based replication achieves 40-60% improvement over per Web site based replication. By adjusting the number of clusters, we can smoothly trade off the management and computation cost for better client performance. We also explore incremental clusterings that adaptively add new documents to the existing content clusters. We examine both offline and online incremental clusterings. The offline clusterings yield close to the performance of the complete re-clustering at much lower overhead. The online incremental clustering and replication cut down the retrieval cost by 4.6-8 times compared to no replication and random replication, so it is especially useful for improving document availability during flash crowds.

Index Terms
Available to subscribers and IEEE members.

References
Available to subscribers and IEEE members.
Citing Documents
Available to subscribers and IEEE members.
You are not logged in.
Guests may access Abstract records free of charge.
Login
Username
Password
» Forgot your password?
Please remember to log out when you have finished your session.
You must log in to access:
• Advanced or Author Search
• CrossRef Search
• AbstractPlus Records
• Full Text PDF
• Full Text HTML
Access this document
Full Text: PDF (392 KB)
» Buy this document now
»  Learn more about
»  Learn more about
    purchasing articles
    and standards

Rights and Permissions
» Learn More
Download this citation
Available to subscribers and IEEE members.
 
arrow_leftView TOC   |  Back to toparrow_up
Indexed by IEE Inspec
© Copyright 2009 IEEE – All Rights Reserved