TY - GEN
T1 - Estimating the web robot population
AU - Sun, Yang
AU - Giles, C. Lee
PY - 2010
Y1 - 2010
N2 - In this research, capture-recapture (CR) models are used to estimate the population of web robots based on web server access logs from different websites. Each robot is considered as an individual randomly surfing the web and each website is considered as a trap that records the visitation of robots. We use maximum likelihood estimator to fit the observation data. Results show that there are 3,860 identifiable robot User-Agent strings and 780,760 IP addresses being used by web robots around the world. We also examine the origination of the named robots by their IP addresses. The results suggest that over 50% of web robot IP addresses are from United States and China.
AB - In this research, capture-recapture (CR) models are used to estimate the population of web robots based on web server access logs from different websites. Each robot is considered as an individual randomly surfing the web and each website is considered as a trap that records the visitation of robots. We use maximum likelihood estimator to fit the observation data. Results show that there are 3,860 identifiable robot User-Agent strings and 780,760 IP addresses being used by web robots around the world. We also examine the origination of the named robots by their IP addresses. The results suggest that over 50% of web robot IP addresses are from United States and China.
UR - http://www.scopus.com/inward/record.url?scp=77954597480&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=77954597480&partnerID=8YFLogxK
U2 - 10.1145/1772690.1772868
DO - 10.1145/1772690.1772868
M3 - Conference contribution
AN - SCOPUS:77954597480
SN - 9781605587998
T3 - Proceedings of the 19th International Conference on World Wide Web, WWW '10
SP - 1189
EP - 1190
BT - Proceedings of the 19th International Conference on World Wide Web, WWW '10
T2 - 19th International World Wide Web Conference, WWW2010
Y2 - 26 April 2010 through 30 April 2010
ER -