Collaborative Research: SaTC: CORE: Small: Towards Label Enrichment and Refinement to Harden Learning-based Security Defenses

Project: Research project

Project Details


This project aims to harden machine learning based security defenses by improving their ability to handle dynamic changes. From data breaches to ransomware infections, the increasingly sophisticated attacks are posing a serious threat to Internet-enabled systems and their users. While machine learning has shown great promise to build the next generation of defense, these defense systems are vulnerable to the dynamic changes (or concept drift) in the data caused by attacker evolvement and behavior changes of benign players. Traditionally, detecting and mitigating the impact of concept drift requires significant efforts to label new data, which is challenging to scale up. In this project, the team of researchers will design novel schemes to improve the adaptability and resilience of learning-based defenses that require minimal labeling capacity. The core idea is to use self-supervised learning models, utilizing unlabeled data and obtaining supervision from the data itself. If successful, the project will provide the much-needed tools to measure, detect, and mitigate concept drift for security applications, including malware analysis, network intrusion detection, and bot detection.

The team of researchers will first focus on measuring concept drift over longitudinal data. With a focus on real-world malware samples, the team will develop measurement tools to extract and characterize different types of concept drift to understand their patterns. In the next stage, the team will develop reactive methods to detect drifting samples via contrastive learning (a form of self-supervision), and methods to select drifting samples to facilitate efficient labeling. Finally, the team will move from reactive defense to proactive approaches. The plan is to use adversarial generative models (another form of self-supervision) to synthesize richer data and labels that mimic future mutations of attackers, which will be used to harden the defenses at the training stage. The proposed techniques are expected to reduce the data labeling costs for learning-based defenses and improve their long-term sustainability to protect users, organizations, and critical infrastructures. The team will also leverage this project to recruit and mentor underrepresented students, develop new course materials, and perform technology transfer.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

Effective start/end date11/15/219/30/24


  • National Science Foundation: $247,903.00
  • National Science Foundation: $247,903.00


Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.