TY - GEN
T1 - BERT-Cuckoo15
T2 - 58th Hawaii International Conference on System Sciences, HICSS 2025
AU - Rabadi, Dima
AU - Loo, Jia Y.
AU - Teo, Sin G.
N1 - Publisher Copyright:
© 2025 IEEE Computer Society. All rights reserved.
PY - 2025
Y1 - 2025
N2 - Malware detection presents significant challenges due to the need to select features from diverse data sources, such as system calls and registry keys, impacting model accuracy. Existing techniques often rely on a single feature type to reduce feature numbers or require extensive feature engineering, potentially failing to capture intricate relationships between various features. Moreover, these methods usually assume that features are independent, which is not true for complex malware behavior. Despite their success, the reliance on handcrafted features and inability to fully leverage contextual information limits their effectiveness against sophisticated malware. To address these constraints, we introduce BERT-Cuckoo15, a malware detection model that leverages Bidirectional Encoder Representations from Transformers (BERT), to analyze relationships between diverse features derived from the dynamic analysis of samples in the Cuckoo sandbox. The model processes and encodes these features into chunks, allowing for the aggregation of contextual information across different system activities. Our evaluation, conducted on a comprehensive and balanced dataset of 36,770 samples across nine malware types, demonstrates the efficacy of our approach. BERT-Cuckoo15 achieves an accuracy of 97.61%, showcasing its ability to capture complex feature interdependencies and improve malware detection accuracy.
AB - Malware detection presents significant challenges due to the need to select features from diverse data sources, such as system calls and registry keys, impacting model accuracy. Existing techniques often rely on a single feature type to reduce feature numbers or require extensive feature engineering, potentially failing to capture intricate relationships between various features. Moreover, these methods usually assume that features are independent, which is not true for complex malware behavior. Despite their success, the reliance on handcrafted features and inability to fully leverage contextual information limits their effectiveness against sophisticated malware. To address these constraints, we introduce BERT-Cuckoo15, a malware detection model that leverages Bidirectional Encoder Representations from Transformers (BERT), to analyze relationships between diverse features derived from the dynamic analysis of samples in the Cuckoo sandbox. The model processes and encodes these features into chunks, allowing for the aggregation of contextual information across different system activities. Our evaluation, conducted on a comprehensive and balanced dataset of 36,770 samples across nine malware types, demonstrates the efficacy of our approach. BERT-Cuckoo15 achieves an accuracy of 97.61%, showcasing its ability to capture complex feature interdependencies and improve malware detection accuracy.
UR - https://www.scopus.com/pages/publications/105005143306
UR - https://www.scopus.com/pages/publications/105005143306#tab=citedBy
M3 - Conference contribution
AN - SCOPUS:105005143306
T3 - Proceedings of the Annual Hawaii International Conference on System Sciences
SP - 393
EP - 402
BT - Proceedings of the 58th Hawaii International Conference on System Sciences, HICSS 2025
A2 - Bui, Tung X.
PB - IEEE Computer Society
Y2 - 7 January 2025 through 10 January 2025
ER -