A Scalable Mixture Model Based Defense Against Data Poisoning Attacks on Classifiers

Xi Li, David J. Miller, Zhen Xiang, George Kesidis

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Scopus citations

Abstract

Classifiers, e.g., those based on Naive Bayes, a support vector machine, or even a neural network, are highly susceptible to a data-poisoning attack. The attack objective is to degrade classification accuracy by covertly embedding malicious (labeled) samples into the training set. Such attacks can be mounted by an insider, through an outsourcing process (for data acquisition or training), or conceivably during active learning. In some cases, a very small amount of poisoning can result in dramatic reduction in classification accuracy. Data poisoning attacks are successful mainly because the malicious injected samples significantly skew the data distribution of the corrupted class. Such attack samples are generally data outliers and in principle separable from the clean samples. We propose a generalized, scalable, and dynamic data driven defense system that: 1) uses a mixture model both to well-fit the (potentially multi-modal) data and to give potential to isolate attack samples in a small subset of the mixture components; 2) performs hypothesis testing to decide both which components and which samples within those components are poisoned, with the identified poisoned ones purged from the training set. Our approaches addresses the attack scenario where adversarial samples are an unknown subset embedded in the initial training set, and can be used to perform data sanitization as a precursor to the training of any type of classifier. The promising results for experiments on the TREC05 spam corpus and Amazon reviews polarity dataset demonstrate the effectiveness of our defense strategy.

Original languageEnglish (US)
Title of host publicationDynamic Data Driven Application Systems - Third International Conference, DDDAS 2020, Proceedings
EditorsFrederica Darema, Erik Blasch, Sai Ravela, Alex Aved
PublisherSpringer Science and Business Media Deutschland GmbH
Pages262-273
Number of pages12
ISBN (Print)9783030617240
DOIs
StatePublished - 2020
Event3rd International Conference on Dynamic Data Driven Application Systems, DDDAS 2020 - Boston, United States
Duration: Oct 2 2020Oct 4 2020

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume12312 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference3rd International Conference on Dynamic Data Driven Application Systems, DDDAS 2020
Country/TerritoryUnited States
CityBoston
Period10/2/2010/4/20

All Science Journal Classification (ASJC) codes

  • Theoretical Computer Science
  • General Computer Science

Fingerprint

Dive into the research topics of 'A Scalable Mixture Model Based Defense Against Data Poisoning Attacks on Classifiers'. Together they form a unique fingerprint.

Cite this