URL Assignment Algorithm of Crawler in Distributed System Based on Hash | IEEE Conference Publication | IEEE Xplore

URL Assignment Algorithm of Crawler in Distributed System Based on Hash


Abstract:

Web crawlers are the key component of services running on Internet and providing searching and indexing support for the entire Web, for corporate Intranets and large port...Show More

Abstract:

Web crawlers are the key component of services running on Internet and providing searching and indexing support for the entire Web, for corporate Intranets and large portal sites. More recently, crawlers have also been used as tools to conduct focused Web searches and to gather data about the characteristics of the WWW. In this paper, we research on the gathering model of crawler in the distributed circumstance. We describe the function of every module and establish some rules which crawlers must follow to maintain the equilibrium load and robustness of system when they are searching on the Web simultaneously. Then we design and implement a new URL assignment algorithm based on hash for partitioning the domain to crawl, and more in general discuss the complete decentralization of every task.
Date of Conference: 06-08 April 2008
Date Added to IEEE Xplore: 20 May 2008
ISBN Information:
Conference Location: Sanya, China

Contact IEEE to Subscribe

References

References is not available for this document.