Leveraging COUNT information in sampling hidden databases

Arjun Dasgupta, Nan Zhang, Gautam Das

Research output: Chapter in Book/Report/Conference proceedingConference contribution

39 Scopus citations

Abstract

A large number of online databases are hidden behind form-like interfaces which allow users to execute search queries by specifying selection conditions in the interface. Most of these interfaces return restricted answers (e.g., only top-k of the selected tuples), while many of them also accompany each answer with the COUNT of the selected tuples. In this paper, we propose techniques which leverage the COUNT information to efficiently acquire unbiased samples of the hidden database. We also discuss variants for interfaces which do not provide COUNT information. We conduct extensive experiments to illustrate the efficiency and accuracy of our techniques.

Original languageEnglish (US)
Title of host publicationProceedings - 25th IEEE International Conference on Data Engineering, ICDE 2009
Pages329-340
Number of pages12
DOIs
StatePublished - 2009
Event25th IEEE International Conference on Data Engineering, ICDE 2009 - Shanghai, China
Duration: Mar 29 2009Apr 2 2009

Publication series

NameProceedings - International Conference on Data Engineering
ISSN (Print)1084-4627

Other

Other25th IEEE International Conference on Data Engineering, ICDE 2009
Country/TerritoryChina
CityShanghai
Period3/29/094/2/09

All Science Journal Classification (ASJC) codes

  • Software
  • Signal Processing
  • Information Systems

Fingerprint

Dive into the research topics of 'Leveraging COUNT information in sampling hidden databases'. Together they form a unique fingerprint.

Cite this