Skip to Main Content
Structure of a Web site usually reflects the implicit logical relationship among Web pages, and is widely applied to Web mining and Web information retrieval. However, it is difficult for machine to extract structure of a Web site automatically out of varied noise hyperlinks. This paper proposes an algorithm to extract the structure of a Web site automatically based on hyperlink analysis. The algorithm identifies and filters noise hyperlinks by patterns of Web pages these hyperlinks connected, instead of patterns of the hyperlinks. It promises better performances than previous approaches. The preliminary results show that the proposed algorithm has a great improvement on both precision and recall ratio.