Skip to Main Content
For distributed large commercial mirror sites, this paper presents a hybrid information filtering algorithm based on distributed web log mining. Based on multi-agent technology, this algorithm preprocesses the web logs of mirror sites, in which the web page's manual rating is replaced by user browsing preference, and then user access matrix is constructed and standardized. On this basis, this paper proposes utilizing web page similarities to predict the rating for pages not having been rated, thus increasing the pages that have been jointly rated among users. This method could effectively solve the sparsity of user ratings in collaborative filtering. Eventually, a hybrid-filtering model is proposed to overcome the drawbacks of the content-based filtering and the collaborative filtering models. The experimental results show that this algorithm is applicable to distributed web server clustering architecture, avoids the inaccuracy and complexity of web page's manual ratings, effectively solves the faults of traditional filtering models and greatly improve the recommendation quality.