TY - GEN
T1 - Attribute domain discovery for hidden web databases
AU - Jin, Xin
AU - Zhang, Nan
AU - Das, Gautam
PY - 2011
Y1 - 2011
N2 - Many web databases are hidden behind restrictive form-like interfaces which may or may not provide domain information for an attribute. When attribute domains are not available, domain discovery becomes a critical challenge facing the application of a broad range of existing techniques on third-party analytical and mash-up applications over hidden databases. In this paper, we consider the problem of domain discovery over a hidden database through its web interface. We prove that for any database schema, an achievability guarantee on domain discovery can be made based solely upon the interface design. We also develop novel techniques which provide effective guarantees on the comprehensiveness of domain discovery. We present theoretical analysis and extensive experiments to illustrate the effectiveness of our approach.
AB - Many web databases are hidden behind restrictive form-like interfaces which may or may not provide domain information for an attribute. When attribute domains are not available, domain discovery becomes a critical challenge facing the application of a broad range of existing techniques on third-party analytical and mash-up applications over hidden databases. In this paper, we consider the problem of domain discovery over a hidden database through its web interface. We prove that for any database schema, an achievability guarantee on domain discovery can be made based solely upon the interface design. We also develop novel techniques which provide effective guarantees on the comprehensiveness of domain discovery. We present theoretical analysis and extensive experiments to illustrate the effectiveness of our approach.
UR - http://www.scopus.com/inward/record.url?scp=79959982304&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=79959982304&partnerID=8YFLogxK
U2 - 10.1145/1989323.1989381
DO - 10.1145/1989323.1989381
M3 - Conference contribution
AN - SCOPUS:79959982304
SN - 9781450306614
T3 - Proceedings of the ACM SIGMOD International Conference on Management of Data
SP - 553
EP - 564
BT - Proceedings of SIGMOD 2011 and PODS 2011
PB - Association for Computing Machinery
T2 - 2011 ACM SIGMOD and 30th PODS 2011 Conference
Y2 - 12 June 2011 through 16 June 2011
ER -