Exploration of deep web repositories

Nan Zhang, Gautam Das

Research output: Contribution to journalArticlepeer-review

5 Scopus citations

Abstract

With the proliferation of online repositories (e.g., databases or document corpora) hidden behind proprietary web interfaces, e.g., keyword-/form-based search and hierarchical/graph-based browsing interfaces, efficient ways of exploring contents in such hidden repositories are of increasing importance. There are two key challenges: one on the proper understanding of interfaces, and the other on the efficient exploration, e.g., crawling, sampling and analytical processing, of very large repositories. In this tutorial, we focus on the fundamental developments in the field, including web interface understanding, crawling, sampling, and data analytics over web repositories with various types of interfaces and containing structured or unstructured data. Our goal is to encourage audience to initiate their own research in these exciting areas.

Original languageEnglish (US)
Pages (from-to)1506-1507
Number of pages2
JournalProceedings of the VLDB Endowment
Volume4
Issue number12
StatePublished - Aug 2011

All Science Journal Classification (ASJC) codes

  • Computer Science (miscellaneous)
  • General Computer Science

Fingerprint

Dive into the research topics of 'Exploration of deep web repositories'. Together they form a unique fingerprint.

Cite this