Output privacy in data mining

Ting Wang, Ling Liu

Research output: Contribution to journalArticlepeer-review

15 Scopus citations


Privacy has been identified as a vital requirement in designing and implementing data mining systems. In general, privacy preservation demands protecting both input and output privacy: the former refers to sanitizing the raw data itself before performing mining; while the latter refers to preventing the mining output (models or patterns) from malicious inference attacks. This article presents a systematic study on the problem of protecting output privacy in data mining, and particularly, stream mining: (i) we highlight the importance of this problem by showing that even sufficient protection of input privacy does not guarantee that of output privacy; (ii) we present a general inferencing and disclosure model that exploits the intrawindow and interwindow privacy breaches in stream mining output; (iii) we propose a light-weighted countermeasure that effectively eliminates these breaches without explicitly detecting them, while minimizing the loss of output accuracy; (iv) we further optimize the basic scheme by taking account of two types of semantic constraints, aiming at maximally preserving utility-related semantics while maintaining hard privacy guarantee; (v) finally, we conduct extensive experimental evaluation over both synthetic and real data to validate the efficacy of our approach.

Original languageEnglish (US)
Article number1
JournalACM Transactions on Database Systems
Issue number1
StatePublished - Mar 2011

All Science Journal Classification (ASJC) codes

  • Information Systems


Dive into the research topics of 'Output privacy in data mining'. Together they form a unique fingerprint.

Cite this