Exploration and mining of web repositories

Nan Zhang, Gautam Das

Research output: Chapter in Book/Report/Conference proceedingConference contribution

3 Scopus citations

Abstract

With the proliferation of very large data repositories hidden behind web interfaces, e.g., keyword search, form-like search and hierarchical/graph-based browsing interfaces for Amazon.com, eBay.com, etc., efficient ways of searching, exploring and/or mining such web data are of increasing importance. There are two key challenges facing these tasks: how to properly understand web interfaces, and how to bypass the interface restrictions. In this tutorial, we start with a general overview of web search and data mining, including various exciting applications enabled by the effective search, exploration, and mining of web repositories. Then, we focus on the fundamental developments in the field, including web interface understanding, crawling, sampling, and data analytics over web repositories with various types of interfaces. We also discuss the potential changes required for query processing, data mining and machine learning algorithms to be applied to web data. Our goal is two-fold: one is to promote the awareness of existing web data search/explora-tion/mining techniques among all web researchers who are interested in leveraging web data, and the other is to encourage researchers, especially those who have not previously worked in web search and mining before, to initiate their own research in these exciting areas.

Original languageEnglish (US)
Title of host publicationWSDM 2014 - Proceedings of the 7th ACM International Conference on Web Search and Data Mining
PublisherAssociation for Computing Machinery
Pages675
Number of pages1
ISBN (Print)9781450323512
DOIs
StatePublished - 2014
Event7th ACM International Conference on Web Search and Data Mining, WSDM 2014 - New York, NY, United States
Duration: Feb 24 2014Feb 28 2014

Publication series

NameWSDM 2014 - Proceedings of the 7th ACM International Conference on Web Search and Data Mining

Other

Other7th ACM International Conference on Web Search and Data Mining, WSDM 2014
Country/TerritoryUnited States
CityNew York, NY
Period2/24/142/28/14

All Science Journal Classification (ASJC) codes

  • Computer Networks and Communications
  • Information Systems

Fingerprint

Dive into the research topics of 'Exploration and mining of web repositories'. Together they form a unique fingerprint.

Cite this