TY - GEN
T1 - Privacy
T2 - 2008 IEEE 24th International Conference on Data Engineering, ICDE'08
AU - Machanavajjhala, Ashwin
AU - Kifer, Daniel
AU - Abowd, John
AU - Gehrke, Johannes
AU - Vilhuber, Lars
PY - 2008
Y1 - 2008
N2 - In this paper, we propose the first formal privacy analysis of a data anonymization process known as the synthetic data generation, a technique becoming popular in the statistics community. The target application for this work is a mapping program that shows the commuting patterns of the population of the United States. The source data for this application were collected by the U.S. Census Bureau, but due to privacy constraints, they cannot be used directly by the mapping program. Instead, we generate synthetic data that statistically mimic the original data while providing privacy guarantees. We use these synthetic data as a surrogate for the original data. We find that while some existing definitions of privacy are inapplicable to our target application, others are too conservative and render the synthetic data useless since they guard against privacy breaches that are very unlikely. Moreover, the data in our target application is sparse, and none of the existing solutions are tailored to anonymize sparse data. In this paper, we propose solutions to address the above issues.
AB - In this paper, we propose the first formal privacy analysis of a data anonymization process known as the synthetic data generation, a technique becoming popular in the statistics community. The target application for this work is a mapping program that shows the commuting patterns of the population of the United States. The source data for this application were collected by the U.S. Census Bureau, but due to privacy constraints, they cannot be used directly by the mapping program. Instead, we generate synthetic data that statistically mimic the original data while providing privacy guarantees. We use these synthetic data as a surrogate for the original data. We find that while some existing definitions of privacy are inapplicable to our target application, others are too conservative and render the synthetic data useless since they guard against privacy breaches that are very unlikely. Moreover, the data in our target application is sparse, and none of the existing solutions are tailored to anonymize sparse data. In this paper, we propose solutions to address the above issues.
UR - http://www.scopus.com/inward/record.url?scp=52649169678&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=52649169678&partnerID=8YFLogxK
U2 - 10.1109/ICDE.2008.4497436
DO - 10.1109/ICDE.2008.4497436
M3 - Conference contribution
AN - SCOPUS:52649169678
SN - 9781424418374
T3 - Proceedings - International Conference on Data Engineering
SP - 277
EP - 286
BT - Proceedings of the 2008 IEEE 24th International Conference on Data Engineering, ICDE'08
Y2 - 7 April 2008 through 12 April 2008
ER -