DSphere: A Source-Centric Approach to Crawling, Indexing and Searching the World Wide Web
Bamba, B.; Ling Liu; Caverlee, J.; Padliya, V.; Srivatsa, M.; Bansal, T.; Palekar, M.; Patrao, J.; Suiyang Li; Singh, A.
Data Engineering, 2007. ICDE 2007. IEEE 23rd International Conference on
Volume , Issue , 15-20 April 2007 Page(s):1515 - 1516
Digital Object Identifier 10.1109/ICDE.2007.369060
Summary:We describe DSphere - a decentralized system for crawling, indexing, searching and ranking of documents in the World Wide Web. Unlike most of the existing search technologies that depend heavily on a page-centric view of the Web, we advocate a source-centric view of the Web and propose a decentralized architecture for crawling, indexing and searching the Web in a distributed source-specific fashion. A fully decentralized crawler is developed to crawl the World Wide Web where each peer is assigned the responsibility of crawling a specific set of documents referred to as a source collection. Link analysis techniques are used for ranking documents. Traditional link analysis techniques suffer from problems like slow refresh rate and vulnerabilities to Web Spam. We propose a source-based link analysis approach, which computes fast and accurate ranking scores for all crawled documents.
View citation and abstract |