TY - JOUR
T1 - Output privacy in data mining
AU - Wang, Ting
AU - Liu, Ling
N1 - Copyright:
Copyright 2011 Elsevier B.V., All rights reserved.
PY - 2011/3
Y1 - 2011/3
N2 - Privacy has been identified as a vital requirement in designing and implementing data mining systems. In general, privacy preservation demands protecting both input and output privacy: the former refers to sanitizing the raw data itself before performing mining; while the latter refers to preventing the mining output (models or patterns) from malicious inference attacks. This article presents a systematic study on the problem of protecting output privacy in data mining, and particularly, stream mining: (i) we highlight the importance of this problem by showing that even sufficient protection of input privacy does not guarantee that of output privacy; (ii) we present a general inferencing and disclosure model that exploits the intrawindow and interwindow privacy breaches in stream mining output; (iii) we propose a light-weighted countermeasure that effectively eliminates these breaches without explicitly detecting them, while minimizing the loss of output accuracy; (iv) we further optimize the basic scheme by taking account of two types of semantic constraints, aiming at maximally preserving utility-related semantics while maintaining hard privacy guarantee; (v) finally, we conduct extensive experimental evaluation over both synthetic and real data to validate the efficacy of our approach.
AB - Privacy has been identified as a vital requirement in designing and implementing data mining systems. In general, privacy preservation demands protecting both input and output privacy: the former refers to sanitizing the raw data itself before performing mining; while the latter refers to preventing the mining output (models or patterns) from malicious inference attacks. This article presents a systematic study on the problem of protecting output privacy in data mining, and particularly, stream mining: (i) we highlight the importance of this problem by showing that even sufficient protection of input privacy does not guarantee that of output privacy; (ii) we present a general inferencing and disclosure model that exploits the intrawindow and interwindow privacy breaches in stream mining output; (iii) we propose a light-weighted countermeasure that effectively eliminates these breaches without explicitly detecting them, while minimizing the loss of output accuracy; (iv) we further optimize the basic scheme by taking account of two types of semantic constraints, aiming at maximally preserving utility-related semantics while maintaining hard privacy guarantee; (v) finally, we conduct extensive experimental evaluation over both synthetic and real data to validate the efficacy of our approach.
UR - http://www.scopus.com/inward/record.url?scp=79953253848&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=79953253848&partnerID=8YFLogxK
U2 - 10.1145/1929934.1929935
DO - 10.1145/1929934.1929935
M3 - Article
AN - SCOPUS:79953253848
SN - 0362-5915
VL - 36
JO - ACM Transactions on Database Systems
JF - ACM Transactions on Database Systems
IS - 1
M1 - 1
ER -