TY - GEN
T1 - Aggregate suppression for enterprise search engines
AU - Zhang, Mingyang
AU - Zhang, Nan
AU - Das, Gautam
PY - 2012
Y1 - 2012
N2 - Many enterprise websites provide search engines to facilitate customer access to their underlying documents or data. With the web interface of such a search engine, a customer can specify one or a few keywords that he/she is interested in; and the search engine returns a list of documents/tuples matching the user-specified keywords, sorted by an often-proprietary scoring function. It was traditionally believed that, because of its highly-restrictive interface (i.e., keyword search only, no SQL-style queries), such a search engine serves its purpose of answering individual keyword-search queries without disclosing big-picture aggregates over the data which, as we shall show in the paper, may incur significant privacy concerns to the enterprise. Nonetheless, recent work on sampling and aggregate estimation over a search engine's corpus through its keyword-search interface transcends this traditional belief. In this paper, we consider a novel problem of suppressing sensitive aggregates for enterprise search engines while maintaining the quality of answers provided to individual keyword-search queries. We demonstrate the effectiveness and efficiency of our novel techniques through theoretical analysis and extensive experimental studies.
AB - Many enterprise websites provide search engines to facilitate customer access to their underlying documents or data. With the web interface of such a search engine, a customer can specify one or a few keywords that he/she is interested in; and the search engine returns a list of documents/tuples matching the user-specified keywords, sorted by an often-proprietary scoring function. It was traditionally believed that, because of its highly-restrictive interface (i.e., keyword search only, no SQL-style queries), such a search engine serves its purpose of answering individual keyword-search queries without disclosing big-picture aggregates over the data which, as we shall show in the paper, may incur significant privacy concerns to the enterprise. Nonetheless, recent work on sampling and aggregate estimation over a search engine's corpus through its keyword-search interface transcends this traditional belief. In this paper, we consider a novel problem of suppressing sensitive aggregates for enterprise search engines while maintaining the quality of answers provided to individual keyword-search queries. We demonstrate the effectiveness and efficiency of our novel techniques through theoretical analysis and extensive experimental studies.
UR - http://www.scopus.com/inward/record.url?scp=84862671579&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84862671579&partnerID=8YFLogxK
U2 - 10.1145/2213836.2213890
DO - 10.1145/2213836.2213890
M3 - Conference contribution
AN - SCOPUS:84862671579
SN - 9781450312479
T3 - Proceedings of the ACM SIGMOD International Conference on Management of Data
SP - 469
EP - 480
BT - SIGMOD '12 - Proceedings of the International Conference on Management of Data
T2 - 2012 ACM SIGMOD International Conference on Management of Data, SIGMOD '12
Y2 - 21 May 2012 through 24 May 2012
ER -