Skip to Main Content
The effectiveness of real-time lossless compression on Web data is found to be limited by the relatively small size of its basic operation unit, Web object. However, we observe that while the repeated occurrence frequencies of strings within a single object might be low, the frequencies can be very high if they are considered over multiple objects. Furthermore, these strings with high occurrence frequencies are quite static, and the size of such string set is also quite small. In this paper, we propose a complementary compression mechanism on top of the existing HTTP supported algorithms, called the content-aware global static compression (CAGSC), to further improve Web data compression. Its basic idea is to include a static CAGSC table to perform compression on frequently, globally occurred strings. Our experimental result shows that CAGSC can obtain an average of 15% improvement over standard HTTP supported algorithms.