Skip to Main Content
Prefetching is an important technique to reduce the average Web access latency. Existing prefetching methods are based mostly on URL graphs. They use the graphical nature of HTTP links to determine the possible paths through a hypertext system. Although the URL graph-based approaches are effective in prefetching of frequently accessed documents, few of them can prefetch those URLs that are rarely visited. The paper presents a keyword-based semantic prefetching approach to overcome the limitation. It predicts future requests based on semantic preferences of past retrieved Web documents. We apply this technique to Internet news services and implement a client-side personalized prefetching system: NewsAgent. The system exploits semantic preferences by analyzing keywords in URL anchor text of previously accessed documents in different news categories. It employs a neural network model over the keyword set to predict future requests. The system features a self-learning capability and good adaptability to the change of client surfing interest. NewsAgent does not exploit keyword synonymy for conservativeness in prefetching. However, it alleviates the impact of keyword polysemy by taking into account server-provided categorical information in decision-making and, hence, captures more semantic knowledge than term-document literal matching methods. Experimental results from daily browsing of ABC News, CNN, and MSNBC news sites for a period of three months show an achievement of up to 60 percent hit ratio due to prefetching.