BotSeer: An automated information system for analyzing Web robots

Yang Sun, Isaac G. Councill, C. Lee Giles

Research output: Chapter in Book/Report/Conference proceedingConference contribution

12 Scopus citations

Abstract

Robots.txt files are vital to the web since they are supposed to regulate what search engines can and cannot crawl. We present BotSeer, a Web-based information system and search tool that provides resources and services for researching Web robots and trends in Robot Exclusion Protocol deployment and adherence. BotSeer currently indexes and analyzes 2.2 million robots.txt files obtained from 13.2 million websites, as well as a large Web server log of real-world robot behavior and related analyses. BotSeer provides three major services including robots.txt searching, robot bias analysis, and robot-generated log analysis. BotSeer serves as a resource for studying the regulation and behavior of Web robots as well as a tool to inform the creation of effective robots.txt files and crawler implementations.

Original languageEnglish (US)
Title of host publicationProceedings - 8th International Conference on Web Engineering, ICWE 2008
Pages108-114
Number of pages7
DOIs
StatePublished - 2008
Event8th International Conference on Web Engineering, ICWE 2008 - Yorktown Heights, NY, United States
Duration: Jul 14 2008Jul 18 2008

Publication series

NameProceedings - 8th International Conference on Web Engineering, ICWE 2008

Other

Other8th International Conference on Web Engineering, ICWE 2008
Country/TerritoryUnited States
CityYorktown Heights, NY
Period7/14/087/18/08

All Science Journal Classification (ASJC) codes

  • Information Systems
  • Software

Fingerprint

Dive into the research topics of 'BotSeer: An automated information system for analyzing Web robots'. Together they form a unique fingerprint.

Cite this