TY - JOUR
T1 - ABSLearn
T2 - a GNN-based framework for aliasing and buffer-size information retrieval
AU - Liang, Ke
AU - Tan, Jim
AU - Zeng, Dongrui
AU - Huang, Yongzhe
AU - Huang, Xiaolei
AU - Tan, Gang
N1 - Publisher Copyright:
© 2023, The Author(s), under exclusive licence to Springer-Verlag London Ltd., part of Springer Nature.
PY - 2023/8
Y1 - 2023/8
N2 - Inferring aliasing and buffer-size information is important to understanding a C program's memory layout, which is critical to program analysis and security-related tasks. However, traditional static and dynamic program analysis methods suffer from certain limitations: static alias analysis methods suffer from precision loss and have poor scalability. Meanwhile, although dynamic analysis can achieve high precision, there is no soundness guarantee, and an online analysis may cause non-negligible runtime overhead. Besides, the current methods can only capture aliasing information. As for the buffer-size relational information, which is the specific variable storing the size of the buffer pointed by the pointers, it is tough to analyze by traditional methods. Moreover, we observe that most methods are designed for specific information. To address these limitations, we present ABSLearn, a deep learning framework that is capable of retrieving both aliasing and buffer-size information from C programs. The core idea of ABSLearn is to formulate the information retrieval as a link prediction problem, where a Graph Neural Network (GNN) model is applied to solve the problem. We developed the first related dataset that contains 285 C program samples to train ABSLearn. Then, the trained model is applied to infer the information on three practical benchmarks: Gzip-1.2.4, Make-3.80, and Tar-1.15.1. The results show that ABSLearn achieves comparable performance and excellent runtime performance. As the first attempt at applying GNN to infer aliasing and buffer-size information, ABSLearn can potentially benefit future program analysis frameworks.
AB - Inferring aliasing and buffer-size information is important to understanding a C program's memory layout, which is critical to program analysis and security-related tasks. However, traditional static and dynamic program analysis methods suffer from certain limitations: static alias analysis methods suffer from precision loss and have poor scalability. Meanwhile, although dynamic analysis can achieve high precision, there is no soundness guarantee, and an online analysis may cause non-negligible runtime overhead. Besides, the current methods can only capture aliasing information. As for the buffer-size relational information, which is the specific variable storing the size of the buffer pointed by the pointers, it is tough to analyze by traditional methods. Moreover, we observe that most methods are designed for specific information. To address these limitations, we present ABSLearn, a deep learning framework that is capable of retrieving both aliasing and buffer-size information from C programs. The core idea of ABSLearn is to formulate the information retrieval as a link prediction problem, where a Graph Neural Network (GNN) model is applied to solve the problem. We developed the first related dataset that contains 285 C program samples to train ABSLearn. Then, the trained model is applied to infer the information on three practical benchmarks: Gzip-1.2.4, Make-3.80, and Tar-1.15.1. The results show that ABSLearn achieves comparable performance and excellent runtime performance. As the first attempt at applying GNN to infer aliasing and buffer-size information, ABSLearn can potentially benefit future program analysis frameworks.
UR - http://www.scopus.com/inward/record.url?scp=85148379027&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85148379027&partnerID=8YFLogxK
U2 - 10.1007/s10044-023-01142-2
DO - 10.1007/s10044-023-01142-2
M3 - Article
AN - SCOPUS:85148379027
SN - 1433-7541
VL - 26
SP - 1171
EP - 1189
JO - Pattern Analysis and Applications
JF - Pattern Analysis and Applications
IS - 3
ER -