Calibrate: Frequency Estimation and Heavy Hitter Identification with Local Differential Privacy via Incorporating Prior Knowledge

Jinyuan Jia, Neil Zhenqiang Gong

Research output: Chapter in Book/Report/Conference proceedingConference contribution

41 Scopus citations

Abstract

Estimating frequencies of certain items among a population is a basic step in data analytics, which enables more advanced data analytics (e.g., heavy hitter identification, frequent pattern mining), client software optimization, and detecting unwanted or malicious hijacking of user settings in browsers. Frequency estimation and heavy hitter identification with local differential privacy (LDP) protect user privacy as well as the data collector. Existing LDP algorithms cannot leverage 1) prior knowledge about the noise in the estimated item frequencies and 2) prior knowledge about the true item frequencies. As a result, they achieve suboptimal performance in practice. In this work, we aim to design LDP algorithms that can leverage such prior knowledge. Specifically, we design Calibrate to incorporate the prior knowledge via statistical inference. Calibrate can be appended to an existing LDP algorithm to reduce its estimation errors. We model the prior knowledge about the noise and the true item frequencies as two probability distributions, respectively. Given the two probability distributions and an estimated frequency of an item produced by an existing LDP algorithm, our Calibrate computes the conditional probability distribution of the item's frequency and uses the mean of the conditional probability distribution as the calibrated frequency for the item. It is challenging to estimate the two probability distributions due to data sparsity. We address the challenge via integrating techniques from statistics and machine learning. Our empirical results on two real-world datasets show that Calibrate significantly outperforms state-of-the-art LDP algorithms for frequency estimation and heavy hitter identification.

Original languageEnglish (US)
Title of host publicationINFOCOM 2019 - IEEE Conference on Computer Communications
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages2008-2016
Number of pages9
ISBN (Electronic)9781728105154
DOIs
StatePublished - Apr 2019
Event2019 IEEE Conference on Computer Communications, INFOCOM 2019 - Paris, France
Duration: Apr 29 2019May 2 2019

Publication series

NameProceedings - IEEE INFOCOM
Volume2019-April
ISSN (Print)0743-166X

Conference

Conference2019 IEEE Conference on Computer Communications, INFOCOM 2019
Country/TerritoryFrance
CityParis
Period4/29/195/2/19

All Science Journal Classification (ASJC) codes

  • General Computer Science
  • Electrical and Electronic Engineering

Fingerprint

Dive into the research topics of 'Calibrate: Frequency Estimation and Heavy Hitter Identification with Local Differential Privacy via Incorporating Prior Knowledge'. Together they form a unique fingerprint.

Cite this