Synthetic two-way contingency tables that preserve conditional frequencies

Research output: Contribution to journalArticlepeer-review

14 Scopus citations

Abstract

In the area of statistical limitation, releasing synthetic data sets has become a popular method for limiting the risks of disclosure of sensitive information and at the same time maintaining analytic utility of data. However, less work has been done on how to create synthetic contingency tables that preserve some summary statistics of the original table. Studies in this area have primarily focused on generating replacement tables that preserve the margins of the original table since the latter support statistical inferences for a large set of parametric tests and models. Yet, not all synthetic tables that preserve a set of margins yield consistent results. In this paper, we propose alternative synthetic table releases. We describe how to generate complete two-way contingency tables that have the same set of observed conditional frequencies by using tools from computational algebra. We study both the disclosure risk and the data utility associated with such synthetic tabular data releases, and compare them to the traditionally released synthetic tables.

Original languageEnglish (US)
Pages (from-to)225-239
Number of pages15
JournalStatistical Methodology
Volume7
Issue number3
DOIs
StatePublished - May 2010

All Science Journal Classification (ASJC) codes

  • Statistics and Probability

Fingerprint

Dive into the research topics of 'Synthetic two-way contingency tables that preserve conditional frequencies'. Together they form a unique fingerprint.

Cite this