Graph based crawler seed selection

Shuyi Zheng, Pavel Dmitriev, C. Lee Giles

Research output: Chapter in Book/Report/Conference proceedingConference contribution

15 Scopus citations

Abstract

This paper identifies and explores the problem of seed selection in a web-scale crawler. We argue that seed selection is not a trivial but very important problem. Selecting proper seeds can increase the number of pages a crawler will discover, and can result in a collection with more "good" and less "bad" pages. Based on the analysis of the graph structure of the web, we propose several seed selection algorithms. Effectiveness of these algorithms is proved by our experimental results on real web data. Copyright is held by the author/owner(s).

Original languageEnglish (US)
Title of host publicationWWW'09 - Proceedings of the 18th International World Wide Web Conference
Pages1089-1090
Number of pages2
DOIs
StatePublished - 2009
Event18th International World Wide Web Conference, WWW 2009 - Madrid, Spain
Duration: Apr 20 2009Apr 24 2009

Publication series

NameWWW'09 - Proceedings of the 18th International World Wide Web Conference

Other

Other18th International World Wide Web Conference, WWW 2009
Country/TerritorySpain
CityMadrid
Period4/20/094/24/09

All Science Journal Classification (ASJC) codes

  • Computer Networks and Communications

Fingerprint

Dive into the research topics of 'Graph based crawler seed selection'. Together they form a unique fingerprint.

Cite this