TY - GEN
T1 - Maximum likelihood postprocessing for differential privacy under consistency constraints
AU - Lee, Jaewoo
AU - Wang, Yue
AU - Kifer, Daniel
N1 - Publisher Copyright:
Copyright 2015 ACM.
PY - 2015/8/10
Y1 - 2015/8/10
N2 - When analyzing data that has been perturbed for privacy reasons, one is often concerned about its usefulness. Recent research on differential privacy has shown that the accuracy of many data queries can be improved by post-processing the perturbed data to ensure consistency constraints that are known to hold for the original data. Most prior work converted this post-processing step into a least squares minimization problem with customized efficient solutions. While improving accuracy, this approach ignored the noise distribution in the perturbed data. In this paper, to further improve accuracy, we formulate this post-processing step as a constrained maximum likelihood estimation problem, which is equivalent to constrained L1 minimization. Instead of relying on slow linear program solvers, we present a faster generic recipe (based on ADMM) that is suitable for a wide variety of applications including differentially private contingency tables, histograms, and the matrix mechanism (linear queries). An added benefit of our formulation is that it can often take direct advantage of algorithmic tricks used by the prior work on least-squares post-processing. An extensive set of experiments on various datasets demonstrates that this approach significantly improve accuracy over prior work.
AB - When analyzing data that has been perturbed for privacy reasons, one is often concerned about its usefulness. Recent research on differential privacy has shown that the accuracy of many data queries can be improved by post-processing the perturbed data to ensure consistency constraints that are known to hold for the original data. Most prior work converted this post-processing step into a least squares minimization problem with customized efficient solutions. While improving accuracy, this approach ignored the noise distribution in the perturbed data. In this paper, to further improve accuracy, we formulate this post-processing step as a constrained maximum likelihood estimation problem, which is equivalent to constrained L1 minimization. Instead of relying on slow linear program solvers, we present a faster generic recipe (based on ADMM) that is suitable for a wide variety of applications including differentially private contingency tables, histograms, and the matrix mechanism (linear queries). An added benefit of our formulation is that it can often take direct advantage of algorithmic tricks used by the prior work on least-squares post-processing. An extensive set of experiments on various datasets demonstrates that this approach significantly improve accuracy over prior work.
UR - http://www.scopus.com/inward/record.url?scp=84954123107&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84954123107&partnerID=8YFLogxK
U2 - 10.1145/2783258.2783366
DO - 10.1145/2783258.2783366
M3 - Conference contribution
AN - SCOPUS:84954123107
T3 - Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
SP - 635
EP - 644
BT - KDD 2015 - Proceedings of the 21st ACM SIGKDD Conference on Knowledge Discovery and Data Mining
PB - Association for Computing Machinery
T2 - 21st ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD 2015
Y2 - 10 August 2015 through 13 August 2015
ER -