By Topic

Defense response of search engine websites to non cooperating crawlers

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$33 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

3 Author(s)
Rishabh Dev Chandna ; Indian Inst. of Technol. (BHU), Varanasi, India ; Prabhakar Chaubey ; Suresh Chandra Gupta

Robots.txt non cooperating web crawlers are unwanted by any website as they can create serious negative impact in terms of denial of service, privacy and cost. Defense mechanisms such as automated content access protocol, captcha, web crawler trap, real time bot detection etc. have been proposed to protect websites from unwanted crawler access. Although, the extent of these mechanisms being practically applied against such crawlers is not known clearly. In this paper we present an investigation carried out to get insights about defense mechanisms used by websites against robots.txt non cooperating web crawlers. This investigation is limited only to search engine class of websites. MBot, a self-developed non cooperating web crawler is the primary tool used for investigation. On investigation we find that search engine websites do have defense mechanisms to prevent non cooperating crawler access on them. Although, absence of any kind of defense phenomena to prevent MBot's access is also observed on some of the investigated websites. Robustness in observed defense mechanisms to basic network and application parameters like proxy, port number, user agent, IP address etc. is also observed.

Published in:

Information and Communication Technologies (WICT), 2012 World Congress on

Date of Conference:

Oct. 30 2012-Nov. 2 2012