TY - GEN
T1 - Exploration and mining of web repositories
AU - Zhang, Nan
AU - Das, Gautam
N1 - Copyright:
Copyright 2014 Elsevier B.V., All rights reserved.
PY - 2014
Y1 - 2014
N2 - With the proliferation of very large data repositories hidden behind web interfaces, e.g., keyword search, form-like search and hierarchical/graph-based browsing interfaces for Amazon.com, eBay.com, etc., efficient ways of searching, exploring and/or mining such web data are of increasing importance. There are two key challenges facing these tasks: how to properly understand web interfaces, and how to bypass the interface restrictions. In this tutorial, we start with a general overview of web search and data mining, including various exciting applications enabled by the effective search, exploration, and mining of web repositories. Then, we focus on the fundamental developments in the field, including web interface understanding, crawling, sampling, and data analytics over web repositories with various types of interfaces. We also discuss the potential changes required for query processing, data mining and machine learning algorithms to be applied to web data. Our goal is two-fold: one is to promote the awareness of existing web data search/explora-tion/mining techniques among all web researchers who are interested in leveraging web data, and the other is to encourage researchers, especially those who have not previously worked in web search and mining before, to initiate their own research in these exciting areas.
AB - With the proliferation of very large data repositories hidden behind web interfaces, e.g., keyword search, form-like search and hierarchical/graph-based browsing interfaces for Amazon.com, eBay.com, etc., efficient ways of searching, exploring and/or mining such web data are of increasing importance. There are two key challenges facing these tasks: how to properly understand web interfaces, and how to bypass the interface restrictions. In this tutorial, we start with a general overview of web search and data mining, including various exciting applications enabled by the effective search, exploration, and mining of web repositories. Then, we focus on the fundamental developments in the field, including web interface understanding, crawling, sampling, and data analytics over web repositories with various types of interfaces. We also discuss the potential changes required for query processing, data mining and machine learning algorithms to be applied to web data. Our goal is two-fold: one is to promote the awareness of existing web data search/explora-tion/mining techniques among all web researchers who are interested in leveraging web data, and the other is to encourage researchers, especially those who have not previously worked in web search and mining before, to initiate their own research in these exciting areas.
UR - http://www.scopus.com/inward/record.url?scp=84906888349&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84906888349&partnerID=8YFLogxK
U2 - 10.1145/2556195.2556197
DO - 10.1145/2556195.2556197
M3 - Conference contribution
AN - SCOPUS:84906888349
SN - 9781450323512
T3 - WSDM 2014 - Proceedings of the 7th ACM International Conference on Web Search and Data Mining
SP - 675
BT - WSDM 2014 - Proceedings of the 7th ACM International Conference on Web Search and Data Mining
PB - Association for Computing Machinery
T2 - 7th ACM International Conference on Web Search and Data Mining, WSDM 2014
Y2 - 24 February 2014 through 28 February 2014
ER -