TY - GEN
T1 - Unbiased estimation of size and other aggregates over hidden web databases
AU - Dasgupta, Arjun
AU - Jin, Xin
AU - Jewell, Bradley
AU - Zhang, Nan
AU - Das, Gautam
PY - 2010
Y1 - 2010
N2 - Many websites provide restrictive form-like interfaces which allow users to execute search queries on the underlying hidden databases. In this paper, we consider the problem of estimating the size of a hidden database through its web interface. We propose novel techniques which use a small number of queries to produce unbiased estimates with small variance. These techniques can also be used for approximate query processing over hidden databases. We present theoretical analysis and extensive experiments to illustrate the effectiveness of our approach.
AB - Many websites provide restrictive form-like interfaces which allow users to execute search queries on the underlying hidden databases. In this paper, we consider the problem of estimating the size of a hidden database through its web interface. We propose novel techniques which use a small number of queries to produce unbiased estimates with small variance. These techniques can also be used for approximate query processing over hidden databases. We present theoretical analysis and extensive experiments to illustrate the effectiveness of our approach.
UR - http://www.scopus.com/inward/record.url?scp=77954730150&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=77954730150&partnerID=8YFLogxK
U2 - 10.1145/1807167.1807259
DO - 10.1145/1807167.1807259
M3 - Conference contribution
AN - SCOPUS:77954730150
SN - 9781450300322
T3 - Proceedings of the ACM SIGMOD International Conference on Management of Data
SP - 855
EP - 866
BT - Proceedings of the 2010 International Conference on Management of Data, SIGMOD '10
T2 - 2010 International Conference on Management of Data, SIGMOD '10
Y2 - 6 June 2010 through 11 June 2010
ER -