TY - JOUR
T1 - Synthetic two-way contingency tables that preserve conditional frequencies
AU - Slavković, Aleksandra B.
AU - Lee, Juyoun
N1 - Funding Information:
The both authors were supported at some point by the grant research reported here was supported in part by NSF Grant SES-0532407 to the Department of Statistics, Pennsylvania State University.
PY - 2010/5
Y1 - 2010/5
N2 - In the area of statistical limitation, releasing synthetic data sets has become a popular method for limiting the risks of disclosure of sensitive information and at the same time maintaining analytic utility of data. However, less work has been done on how to create synthetic contingency tables that preserve some summary statistics of the original table. Studies in this area have primarily focused on generating replacement tables that preserve the margins of the original table since the latter support statistical inferences for a large set of parametric tests and models. Yet, not all synthetic tables that preserve a set of margins yield consistent results. In this paper, we propose alternative synthetic table releases. We describe how to generate complete two-way contingency tables that have the same set of observed conditional frequencies by using tools from computational algebra. We study both the disclosure risk and the data utility associated with such synthetic tabular data releases, and compare them to the traditionally released synthetic tables.
AB - In the area of statistical limitation, releasing synthetic data sets has become a popular method for limiting the risks of disclosure of sensitive information and at the same time maintaining analytic utility of data. However, less work has been done on how to create synthetic contingency tables that preserve some summary statistics of the original table. Studies in this area have primarily focused on generating replacement tables that preserve the margins of the original table since the latter support statistical inferences for a large set of parametric tests and models. Yet, not all synthetic tables that preserve a set of margins yield consistent results. In this paper, we propose alternative synthetic table releases. We describe how to generate complete two-way contingency tables that have the same set of observed conditional frequencies by using tools from computational algebra. We study both the disclosure risk and the data utility associated with such synthetic tabular data releases, and compare them to the traditionally released synthetic tables.
UR - http://www.scopus.com/inward/record.url?scp=77950918549&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=77950918549&partnerID=8YFLogxK
U2 - 10.1016/j.stamet.2009.11.002
DO - 10.1016/j.stamet.2009.11.002
M3 - Article
AN - SCOPUS:77950918549
SN - 1572-3127
VL - 7
SP - 225
EP - 239
JO - Statistical Methodology
JF - Statistical Methodology
IS - 3
ER -