In the area of statistical limitation, releasing synthetic data sets has become a popular method for limiting the risks of disclosure of sensitive information and at the same time maintaining analytic utility of data. However, less work has been done on how to create synthetic contingency tables that preserve some summary statistics of the original table. Studies in this area have primarily focused on generating replacement tables that preserve the margins of the original table since the latter support statistical inferences for a large set of parametric tests and models. Yet, not all synthetic tables that preserve a set of margins yield consistent results. In this paper, we propose alternative synthetic table releases. We describe how to generate complete two-way contingency tables that have the same set of observed conditional frequencies by using tools from computational algebra. We study both the disclosure risk and the data utility associated with such synthetic tabular data releases, and compare them to the traditionally released synthetic tables.
All Science Journal Classification (ASJC) codes
- Statistics and Probability