TY - JOUR
T1 - Tackling imbalanced data in cybersecurity with transfer learning
T2 - a case with ROP payload detection
AU - Wang, Haizhou
AU - Singhal, Anoop
AU - Liu, Peng
N1 - Publisher Copyright:
© 2023, The Author(s).
PY - 2023/12
Y1 - 2023/12
N2 - In recent years, deep learning gained proliferating popularity in the cybersecurity application domain, since when being compared to traditional machine learning methods, it usually involves less human efforts, produces better results, and provides better generalizability. However, the imbalanced data issue is very common in cybersecurity, which can substantially deteriorate the performance of the deep learning models. This paper introduces a transfer learning based method to tackle the imbalanced data issue in cybersecurity using return-oriented programming payload detection as a case study. We achieved 0.0290 average false positive rate, 0.9705 average F1 score and 0.9521 average detection rate on 3 different target domain programs using 2 different source domain programs, with 0 benign training data sample in the target domain. The performance improvement compared to the baseline is a trade-off between false positive rate and detection rate. Using our approach, the total number of false positives is reduced by 23.16%, and as a trade-off, the number of detected malicious samples decreases by 0.68%.
AB - In recent years, deep learning gained proliferating popularity in the cybersecurity application domain, since when being compared to traditional machine learning methods, it usually involves less human efforts, produces better results, and provides better generalizability. However, the imbalanced data issue is very common in cybersecurity, which can substantially deteriorate the performance of the deep learning models. This paper introduces a transfer learning based method to tackle the imbalanced data issue in cybersecurity using return-oriented programming payload detection as a case study. We achieved 0.0290 average false positive rate, 0.9705 average F1 score and 0.9521 average detection rate on 3 different target domain programs using 2 different source domain programs, with 0 benign training data sample in the target domain. The performance improvement compared to the baseline is a trade-off between false positive rate and detection rate. Using our approach, the total number of false positives is reduced by 23.16%, and as a trade-off, the number of detected malicious samples decreases by 0.68%.
UR - http://www.scopus.com/inward/record.url?scp=85145566770&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85145566770&partnerID=8YFLogxK
U2 - 10.1186/s42400-022-00135-8
DO - 10.1186/s42400-022-00135-8
M3 - Article
C2 - 36620350
AN - SCOPUS:85145566770
SN - 2096-4862
VL - 6
JO - Cybersecurity
JF - Cybersecurity
IS - 1
M1 - 2
ER -