Aggregate suppression for enterprise search engines

Mingyang Zhang, Nan Zhang, Gautam Das

Research output: Chapter in Book/Report/Conference proceedingConference contribution

4 Scopus citations

Abstract

Many enterprise websites provide search engines to facilitate customer access to their underlying documents or data. With the web interface of such a search engine, a customer can specify one or a few keywords that he/she is interested in; and the search engine returns a list of documents/tuples matching the user-specified keywords, sorted by an often-proprietary scoring function. It was traditionally believed that, because of its highly-restrictive interface (i.e., keyword search only, no SQL-style queries), such a search engine serves its purpose of answering individual keyword-search queries without disclosing big-picture aggregates over the data which, as we shall show in the paper, may incur significant privacy concerns to the enterprise. Nonetheless, recent work on sampling and aggregate estimation over a search engine's corpus through its keyword-search interface transcends this traditional belief. In this paper, we consider a novel problem of suppressing sensitive aggregates for enterprise search engines while maintaining the quality of answers provided to individual keyword-search queries. We demonstrate the effectiveness and efficiency of our novel techniques through theoretical analysis and extensive experimental studies.

Original languageEnglish (US)
Title of host publicationSIGMOD '12 - Proceedings of the International Conference on Management of Data
Pages469-480
Number of pages12
DOIs
StatePublished - 2012
Event2012 ACM SIGMOD International Conference on Management of Data, SIGMOD '12 - Scottsdale, AZ, United States
Duration: May 21 2012May 24 2012

Publication series

NameProceedings of the ACM SIGMOD International Conference on Management of Data
ISSN (Print)0730-8078

Other

Other2012 ACM SIGMOD International Conference on Management of Data, SIGMOD '12
Country/TerritoryUnited States
CityScottsdale, AZ
Period5/21/125/24/12

All Science Journal Classification (ASJC) codes

  • Software
  • Information Systems

Fingerprint

Dive into the research topics of 'Aggregate suppression for enterprise search engines'. Together they form a unique fingerprint.

Cite this