TY - GEN
T1 - Graph based crawler seed selection
AU - Zheng, Shuyi
AU - Dmitriev, Pavel
AU - Giles, C. Lee
PY - 2009
Y1 - 2009
N2 - This paper identifies and explores the problem of seed selection in a web-scale crawler. We argue that seed selection is not a trivial but very important problem. Selecting proper seeds can increase the number of pages a crawler will discover, and can result in a collection with more "good" and less "bad" pages. Based on the analysis of the graph structure of the web, we propose several seed selection algorithms. Effectiveness of these algorithms is proved by our experimental results on real web data. Copyright is held by the author/owner(s).
AB - This paper identifies and explores the problem of seed selection in a web-scale crawler. We argue that seed selection is not a trivial but very important problem. Selecting proper seeds can increase the number of pages a crawler will discover, and can result in a collection with more "good" and less "bad" pages. Based on the analysis of the graph structure of the web, we propose several seed selection algorithms. Effectiveness of these algorithms is proved by our experimental results on real web data. Copyright is held by the author/owner(s).
UR - http://www.scopus.com/inward/record.url?scp=79960127814&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=79960127814&partnerID=8YFLogxK
U2 - 10.1145/1526709.1526870
DO - 10.1145/1526709.1526870
M3 - Conference contribution
AN - SCOPUS:79960127814
SN - 9781605584874
T3 - WWW'09 - Proceedings of the 18th International World Wide Web Conference
SP - 1089
EP - 1090
BT - WWW'09 - Proceedings of the 18th International World Wide Web Conference
T2 - 18th International World Wide Web Conference, WWW 2009
Y2 - 20 April 2009 through 24 April 2009
ER -