TY - JOUR
T1 - Adaptive Cyber Defense against Multi-Stage Attacks Using Learning-Based POMDP
AU - Hu, Zhisheng
AU - Zhu, Minghui
AU - Liu, Peng
N1 - Publisher Copyright:
© 2020 ACM.
PY - 2020/11
Y1 - 2020/11
N2 - Growing multi-stage attacks in computer networks impose significant security risks and necessitate the development of effective defense schemes that are able to autonomously respond to intrusions during vulnerability windows. However, the defender faces several real-world challenges, e.g., unknown likelihoods and unknown impacts of successful exploits. In this article, we leverage reinforcement learning to develop an innovative adaptive cyber defense to maximize the cost-effectiveness subject to the aforementioned challenges. In particular, we use Bayesian attack graphs to model the interactions between the attacker and networks. Then we formulate the defense problem of interest as a partially observable Markov decision process problem where the defender maintains belief states to estimate system states, leverages Thompson sampling to estimate transition probabilities, and utilizes reinforcement learning to choose optimal defense actions using measured utility values. The algorithm performance is verified via numerical simulations based on real-world attacks.
AB - Growing multi-stage attacks in computer networks impose significant security risks and necessitate the development of effective defense schemes that are able to autonomously respond to intrusions during vulnerability windows. However, the defender faces several real-world challenges, e.g., unknown likelihoods and unknown impacts of successful exploits. In this article, we leverage reinforcement learning to develop an innovative adaptive cyber defense to maximize the cost-effectiveness subject to the aforementioned challenges. In particular, we use Bayesian attack graphs to model the interactions between the attacker and networks. Then we formulate the defense problem of interest as a partially observable Markov decision process problem where the defender maintains belief states to estimate system states, leverages Thompson sampling to estimate transition probabilities, and utilizes reinforcement learning to choose optimal defense actions using measured utility values. The algorithm performance is verified via numerical simulations based on real-world attacks.
UR - http://www.scopus.com/inward/record.url?scp=85097491449&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85097491449&partnerID=8YFLogxK
U2 - 10.1145/3418897
DO - 10.1145/3418897
M3 - Article
AN - SCOPUS:85097491449
SN - 2471-2566
VL - 24
JO - ACM Transactions on Privacy and Security
JF - ACM Transactions on Privacy and Security
IS - 1
M1 - 6
ER -