TY - GEN
T1 - The power of linear reconstruction attacks
AU - Kasiviswanathan, Shiva Prasad
AU - Rudelson, Mark
AU - Smith, Adam
N1 - Copyright:
Copyright 2020 Elsevier B.V., All rights reserved.
PY - 2013
Y1 - 2013
N2 - We consider the power of "linear reconstruction attacks" in statistical data privacy, showing that they can be applied to a much wider range of settings than previously understood. Linear attacks have been studied before [3, 6, 11, 1, 14] but have so far been applied only in settings with releases that are "obviously" linear. Consider a database curator who manages a database of sensitive information but wants to release statistics about how a sensitive attribute (say, disease) in the database relates to some nonsensitive attributes (e.g., postal code, age, gender, etc). This setting is widely considered in the literature, partly since it arises with medical data. Specifically, we show one can mount linear reconstruction attacks based on any release that gives: 1. the fraction of records that satisfy a given nondegenerate boolean function. Such releases include contingency tables (previously studied by Kasiviswanathan et al. [11]) as well as more complex outputs like the error rate of classifiers such as decision trees; 2. any one of a large class of M-estimators (that is, the output of empirical risk minimization algorithms), including the standard estimators for linear and logistic regression. We make two contributions: first, we show how these types of releases can be transformed into a linear format, making them amenable to existing polynomial-time reconstruction algorithms. This is already perhaps surprising, since many of the above releases (like M-estimators) are obtained by solving highly nonlinear formulations. Second, we show how to analyze the resulting attacks under various distributional assumptions on the data. Specifically, we consider a setting in which the same statistic (either 1 or 2 above) is released about how the sensitive attribute relates to all subsets of size k (out of a total of d) nonsensitive boolean attributes.
AB - We consider the power of "linear reconstruction attacks" in statistical data privacy, showing that they can be applied to a much wider range of settings than previously understood. Linear attacks have been studied before [3, 6, 11, 1, 14] but have so far been applied only in settings with releases that are "obviously" linear. Consider a database curator who manages a database of sensitive information but wants to release statistics about how a sensitive attribute (say, disease) in the database relates to some nonsensitive attributes (e.g., postal code, age, gender, etc). This setting is widely considered in the literature, partly since it arises with medical data. Specifically, we show one can mount linear reconstruction attacks based on any release that gives: 1. the fraction of records that satisfy a given nondegenerate boolean function. Such releases include contingency tables (previously studied by Kasiviswanathan et al. [11]) as well as more complex outputs like the error rate of classifiers such as decision trees; 2. any one of a large class of M-estimators (that is, the output of empirical risk minimization algorithms), including the standard estimators for linear and logistic regression. We make two contributions: first, we show how these types of releases can be transformed into a linear format, making them amenable to existing polynomial-time reconstruction algorithms. This is already perhaps surprising, since many of the above releases (like M-estimators) are obtained by solving highly nonlinear formulations. Second, we show how to analyze the resulting attacks under various distributional assumptions on the data. Specifically, we consider a setting in which the same statistic (either 1 or 2 above) is released about how the sensitive attribute relates to all subsets of size k (out of a total of d) nonsensitive boolean attributes.
UR - http://www.scopus.com/inward/record.url?scp=84876061488&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84876061488&partnerID=8YFLogxK
U2 - 10.1137/1.9781611973105.102
DO - 10.1137/1.9781611973105.102
M3 - Conference contribution
AN - SCOPUS:84876061488
SN - 9781611972511
T3 - Proceedings of the Annual ACM-SIAM Symposium on Discrete Algorithms
SP - 1415
EP - 1433
BT - Proceedings of the 24th Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2013
PB - Association for Computing Machinery
T2 - 24th Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2013
Y2 - 6 January 2013 through 8 January 2013
ER -