Rank discovery from web databases

Saravanan Thirumuruganathan, Nan Zhang, Gautam Das

Research output: Contribution to journalConference articlepeer-review

3 Scopus citations

Abstract

Many web databases are only accessible through a proprietary search interface which allows users to form a query by entering the desired values for a few attributes. After receiving a query, the system returns the top-k matching tuples according to a pre-determined ranking function. Since the rank of a tuple largely determines the attention it receives from website users, ranking information for any tuple - not just the top-ranked ones - is often of significant interest to third parties such as sellers, customers, market researchers and investors. In this paper, we define a novel problem of rank discovery over hidden web databases. We introduce a taxonomy of ranking functions, and show that different types of ranking functions require fundamentally different approaches for rank discovery. Our technical contributions include principled and efficient randomized algorithms for estimating the rank of a given tuple, as well as negative results which demonstrate the inefficiency of any deterministic algorithm. We show extensive experimental results over real-world databases, including an online experiment at Amazon.com, which illustrates the effectiveness of our proposed techniques.

Original languageEnglish (US)
Pages (from-to)1582-1593
Number of pages12
JournalProceedings of the VLDB Endowment
Volume6
Issue number13
DOIs
StatePublished - Aug 2013
Event39th International Conference on Very Large Data Bases, VLDB 2012 - Trento, Italy
Duration: Aug 26 2013Aug 30 2013

All Science Journal Classification (ASJC) codes

  • Computer Science (miscellaneous)
  • General Computer Science

Fingerprint

Dive into the research topics of 'Rank discovery from web databases'. Together they form a unique fingerprint.

Cite this