TY - GEN
T1 - Robust Traceability from Trace Amounts
AU - Dwork, Cynthia
AU - Smith, Adam
AU - Steinke, Thomas
AU - Ullman, Jonathan
AU - Vadhan, Salil
N1 - Publisher Copyright:
© 2015 IEEE.
PY - 2015/12/11
Y1 - 2015/12/11
N2 - The privacy risks inherent in the release of a large number of summary statistics were illustrated by Homer et al. (PLoS Genetics, 2008), who considered the case of 1-way marginals of SNP allele frequencies obtained in a genome-wide association study: Given a large number of minor allele frequencies from a case group of individuals diagnosed with a particular disease, together with the genomic data of a single target individual and statistics from a sizable reference dataset independently drawn from the same population, an attacker can determine with high confidence whether or not the target is in the case group. In this work we describe and analyze a simple attack that succeeds even if the summary statistics are significantly distorted, whether due to measurement error or noise intentionally introduced to protect privacy. Our attack only requires that the vector of distorted summary statistics is close to the vector of true marginals in L1 norm. Moreover, the reference pool required by previous attacks can be replaced by a single sample drawn from the underlying population. The new attack, which is not specific to genomics and which handles Gaussian as well as Bernouilli data, significantly generalizes recent lower bounds on the noise needed to ensure differential privacy (Bun, Ullman, and Vadhan, STOC 2014, Steinke and Ullman, 2015), obviating the need for the attacker to control the exact distribution of the data.
AB - The privacy risks inherent in the release of a large number of summary statistics were illustrated by Homer et al. (PLoS Genetics, 2008), who considered the case of 1-way marginals of SNP allele frequencies obtained in a genome-wide association study: Given a large number of minor allele frequencies from a case group of individuals diagnosed with a particular disease, together with the genomic data of a single target individual and statistics from a sizable reference dataset independently drawn from the same population, an attacker can determine with high confidence whether or not the target is in the case group. In this work we describe and analyze a simple attack that succeeds even if the summary statistics are significantly distorted, whether due to measurement error or noise intentionally introduced to protect privacy. Our attack only requires that the vector of distorted summary statistics is close to the vector of true marginals in L1 norm. Moreover, the reference pool required by previous attacks can be replaced by a single sample drawn from the underlying population. The new attack, which is not specific to genomics and which handles Gaussian as well as Bernouilli data, significantly generalizes recent lower bounds on the noise needed to ensure differential privacy (Bun, Ullman, and Vadhan, STOC 2014, Steinke and Ullman, 2015), obviating the need for the attacker to control the exact distribution of the data.
UR - http://www.scopus.com/inward/record.url?scp=84960325629&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84960325629&partnerID=8YFLogxK
U2 - 10.1109/FOCS.2015.46
DO - 10.1109/FOCS.2015.46
M3 - Conference contribution
AN - SCOPUS:84960325629
T3 - Proceedings - Annual IEEE Symposium on Foundations of Computer Science, FOCS
SP - 60
EP - 669
BT - Proceedings - 2015 IEEE 56th Annual Symposium on Foundations of Computer Science, FOCS 2015
PB - IEEE Computer Society
T2 - 56th IEEE Annual Symposium on Foundations of Computer Science, FOCS 2015
Y2 - 17 October 2015 through 20 October 2015
ER -