Differentially private model selection with penalized and constrained likelihood

Jing Lei, Anne Sophie Charest, Aleksandra Slavkovic, Adam Smith, Stephen Fienberg

Research output: Contribution to journalArticlepeer-review

10 Scopus citations

Abstract

In statistical disclosure control, the goal of data analysis is twofold: the information released must provide accurate and useful statistics about the underlying population of interest, while minimizing the potential for an individual record to be identified. In recent years, the notion of differential privacy has received much attention in theoretical computer science, machine learning and statistics. It provides a rigorous and strong notion of protection for individuals’ sensitive information. A fundamental question is how to incorporate differential privacy in traditional statistical inference procedures. We study model selection in multivariate linear regression under the constraint of differential privacy. We show that model selection procedures based on penalized least squares or likelihood can be made differentially private by a combination of regularization and randomization, and we propose two algorithms to do so. We show that our privacy procedures are consistent under essentially the same conditions as the corresponding non-privacy procedures. We also find that, under differential privacy, the procedure becomes more sensitive to the tuning parameters. We illustrate and evaluate our method by using simulation studies and two real data examples.

Original languageEnglish (US)
Pages (from-to)609-633
Number of pages25
JournalJournal of the Royal Statistical Society. Series A: Statistics in Society
Volume181
Issue number3
DOIs
StatePublished - Jun 2018

All Science Journal Classification (ASJC) codes

  • Statistics and Probability
  • Social Sciences (miscellaneous)
  • Economics and Econometrics
  • Statistics, Probability and Uncertainty

Fingerprint

Dive into the research topics of 'Differentially private model selection with penalized and constrained likelihood'. Together they form a unique fingerprint.

Cite this