TY - GEN
T1 - HDSampler
T2 - International Conference on Management of Data and 28th Symposium on Principles of Database Systems, SIGMOD-PODS'09
AU - Maiti, Anirban
AU - Dasgupta, Arjun
AU - Zhang, Nan
AU - Das, Gautam
PY - 2009
Y1 - 2009
N2 - A large number of online databases are hidden behind the web. Users to these systems can form queries through web forms to retrieve a small sample of the database. Sampling such hidden databases is widely desired for understanding the nature and quality of data stored in them. We have developed HDSampler, which to the best of our knowledge is the first practical system for sampling structured hidden web databases. It enables efficient sampling of the databases and accurate answering of aggregate queries, to provide analysts with valuable information for data analytics, as well as help power a multitude of third-party applications such as web-mashups and meta-search engines. For the purpose of this demo, we present an instance of HDSampler on Google Base a content-rich hidden web database maintained by Google. By using HDSampler, the demo reveals a snapshot of the marginal distribution of various attributes of Google Base in a matter of minutes.
AB - A large number of online databases are hidden behind the web. Users to these systems can form queries through web forms to retrieve a small sample of the database. Sampling such hidden databases is widely desired for understanding the nature and quality of data stored in them. We have developed HDSampler, which to the best of our knowledge is the first practical system for sampling structured hidden web databases. It enables efficient sampling of the databases and accurate answering of aggregate queries, to provide analysts with valuable information for data analytics, as well as help power a multitude of third-party applications such as web-mashups and meta-search engines. For the purpose of this demo, we present an instance of HDSampler on Google Base a content-rich hidden web database maintained by Google. By using HDSampler, the demo reveals a snapshot of the marginal distribution of various attributes of Google Base in a matter of minutes.
UR - http://www.scopus.com/inward/record.url?scp=70849103063&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=70849103063&partnerID=8YFLogxK
U2 - 10.1145/1559845.1560001
DO - 10.1145/1559845.1560001
M3 - Conference contribution
AN - SCOPUS:70849103063
SN - 9781605585543
T3 - SIGMOD-PODS'09 - Proceedings of the International Conference on Management of Data and 28th Symposium on Principles of Database Systems
SP - 1131
EP - 1133
BT - SIGMOD-PODS'09 - Proceedings of the International Conference on Management of Data and 28th Symposium on Principles of Database Systems
Y2 - 29 June 2009 through 2 July 2009
ER -