Synthetic data via quantile regression for heavy-tailed and heteroskedastic data

Michelle Pistner, Aleksandra Slavković, Lars Vilhuber

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Scopus citations

Abstract

Privacy protection of confidential data is a fundamental problem faced by many government organizations and research centers. It is further complicated when data have complex structures or variables with highly skewed distributions. The statistical community addresses general privacy concerns by introducing different techniques that aim to decrease disclosure risk in released data while retaining their statistical properties. However, methods for complex data structures have received insufficient attention. We propose producing synthetic data via quantile regression to address privacy protection of heavy-tailed and heteroskedastic data. We address some shortcomings of the previously proposed use of quantile regression as a synthesis method and extend the work into cases where data have heavy tails or heteroskedastic errors. Using a simulation study and two applications, we show that there are settings where quantile regression performs as well as or better than other commonly used synthesis methods on the basis of maintaining good data utility while simultaneously decreasing disclosure risk.

Original languageEnglish (US)
Title of host publicationPrivacy in Statistical Databases - UNESCO Chair in Data Privacy, International Conference, PSD 2018, Proceedings
EditorsFrancisco Montes, Josep Domingo-Ferrer
PublisherSpringer Verlag
Pages92-108
Number of pages17
ISBN (Print)9783319997704
DOIs
StatePublished - 2018
EventInternational Conference on Privacy in Statistical Databases, PSD 2018 - Valencia, Spain
Duration: Sep 26 2018Sep 28 2018

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume11126 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Other

OtherInternational Conference on Privacy in Statistical Databases, PSD 2018
Country/TerritorySpain
CityValencia
Period9/26/189/28/18

All Science Journal Classification (ASJC) codes

  • Theoretical Computer Science
  • General Computer Science

Fingerprint

Dive into the research topics of 'Synthetic data via quantile regression for heavy-tailed and heteroskedastic data'. Together they form a unique fingerprint.

Cite this