On estimating the swapping rate for categorical data

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Scopus citations

Abstract

When analyzing data, it is important to account for all sources of noise. Public use datasets, such as those provided by the Census Bureau, often undergo additional perturbations designed to protect confidentiality. This source of noise is generally ignored in data analysis because crucial parameters and details about its implementation are withheld. In this paper, we consider the problem of inferring such parameters from the data. Specifically, we target data swapping, a perturbation technique commonly used by the U.S. Census Bureau and which, barring practical breakthroughs in disclosure control, will be used in the foreseeable future. The vanilla version of data swapping selects pairs of records and exchanges some of their attribute values. The number of swapped records is kept secret even though it is needed for data analysis and investigations into the confidentiality protection of individual records. We propose algorithms for estimating the number of swapped records in categorical data, even when the true data distribution is unknown.

Original languageEnglish (US)
Title of host publicationKDD 2015 - Proceedings of the 21st ACM SIGKDD Conference on Knowledge Discovery and Data Mining
PublisherAssociation for Computing Machinery
Pages557-566
Number of pages10
ISBN (Electronic)9781450336642
DOIs
StatePublished - Aug 10 2015
Event21st ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD 2015 - Sydney, Australia
Duration: Aug 10 2015Aug 13 2015

Publication series

NameProceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
Volume2015-August

Other

Other21st ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD 2015
Country/TerritoryAustralia
CitySydney
Period8/10/158/13/15

All Science Journal Classification (ASJC) codes

  • Software
  • Information Systems

Fingerprint

Dive into the research topics of 'On estimating the swapping rate for categorical data'. Together they form a unique fingerprint.

Cite this