TY - GEN
T1 - Synthetic data via quantile regression for heavy-tailed and heteroskedastic data
AU - Pistner, Michelle
AU - Slavković, Aleksandra
AU - Vilhuber, Lars
N1 - Publisher Copyright:
© Springer Nature Switzerland AG 2018.
PY - 2018
Y1 - 2018
N2 - Privacy protection of confidential data is a fundamental problem faced by many government organizations and research centers. It is further complicated when data have complex structures or variables with highly skewed distributions. The statistical community addresses general privacy concerns by introducing different techniques that aim to decrease disclosure risk in released data while retaining their statistical properties. However, methods for complex data structures have received insufficient attention. We propose producing synthetic data via quantile regression to address privacy protection of heavy-tailed and heteroskedastic data. We address some shortcomings of the previously proposed use of quantile regression as a synthesis method and extend the work into cases where data have heavy tails or heteroskedastic errors. Using a simulation study and two applications, we show that there are settings where quantile regression performs as well as or better than other commonly used synthesis methods on the basis of maintaining good data utility while simultaneously decreasing disclosure risk.
AB - Privacy protection of confidential data is a fundamental problem faced by many government organizations and research centers. It is further complicated when data have complex structures or variables with highly skewed distributions. The statistical community addresses general privacy concerns by introducing different techniques that aim to decrease disclosure risk in released data while retaining their statistical properties. However, methods for complex data structures have received insufficient attention. We propose producing synthetic data via quantile regression to address privacy protection of heavy-tailed and heteroskedastic data. We address some shortcomings of the previously proposed use of quantile regression as a synthesis method and extend the work into cases where data have heavy tails or heteroskedastic errors. Using a simulation study and two applications, we show that there are settings where quantile regression performs as well as or better than other commonly used synthesis methods on the basis of maintaining good data utility while simultaneously decreasing disclosure risk.
UR - http://www.scopus.com/inward/record.url?scp=85053860435&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85053860435&partnerID=8YFLogxK
U2 - 10.1007/978-3-319-99771-1_7
DO - 10.1007/978-3-319-99771-1_7
M3 - Conference contribution
AN - SCOPUS:85053860435
SN - 9783319997704
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 92
EP - 108
BT - Privacy in Statistical Databases - UNESCO Chair in Data Privacy, International Conference, PSD 2018, Proceedings
A2 - Montes, Francisco
A2 - Domingo-Ferrer, Josep
PB - Springer Verlag
T2 - International Conference on Privacy in Statistical Databases, PSD 2018
Y2 - 26 September 2018 through 28 September 2018
ER -