CADE: Detecting and explaining concept drift samples for security applications

Limin Yang, Wenbo Guo, Qingying Hao, Arridhana Ciptadi, Ali Ahmadzadeh, Xinyu Xing, Gang Wang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

101 Scopus citations

Abstract

Concept drift poses a critical challenge to deploy machine learning models to solve practical security problems. Due to the dynamic behavior changes of attackers (and/or the benign counterparts), the testing data distribution is often shifting from the original training data over time, causing major failures to the deployed model. To combat concept drift, we present a novel system CADE aiming to 1) detect drifting samples that deviate from existing classes, and 2) provide explanations to reason the detected drift. Unlike traditional approaches (that require a large number of new labels to determine concept drift statistically), we aim to identify individual drifting samples as they arrive. Recognizing the challenges introduced by the high-dimensional outlier space, we propose to map the data samples into a low-dimensional space and automatically learn a distance function to measure the dissimilarity between samples. Using contrastive learning, we can take full advantage of existing labels in the training dataset to learn how to compare and contrast pairs of samples. To reason the meaning of the detected drift, we develop a distance-based explanation method. We show that explaining “distance” is much more effective than traditional methods that focus on explaining a “decision boundary” in this problem context. We evaluate CADE with two case studies: Android malware classification and network intrusion detection. We further work with a security company to test CADE on its malware database. Our results show that CADE can effectively detect drifting samples and provide semantically meaningful explanations.

Original languageEnglish (US)
Title of host publicationProceedings of the 30th USENIX Security Symposium
PublisherUSENIX Association
Pages2327-2344
Number of pages18
ISBN (Electronic)9781939133243
StatePublished - 2021
Event30th USENIX Security Symposium, USENIX Security 2021 - Virtual, Online
Duration: Aug 11 2021Aug 13 2021

Publication series

NameProceedings of the 30th USENIX Security Symposium

Conference

Conference30th USENIX Security Symposium, USENIX Security 2021
CityVirtual, Online
Period8/11/218/13/21

All Science Journal Classification (ASJC) codes

  • Computer Networks and Communications
  • Information Systems
  • Safety, Risk, Reliability and Quality

Fingerprint

Dive into the research topics of 'CADE: Detecting and explaining concept drift samples for security applications'. Together they form a unique fingerprint.

Cite this