TY - GEN
T1 - Hierarchical Attention Network for Interpretable and Fine-Grained Vulnerability Detection
AU - Gu, Mianxue
AU - Feng, Hantao
AU - Sun, Hongyu
AU - Liu, Peng
AU - Yue, Qiuling
AU - Hu, Jinglu
AU - Cao, Chunjie
AU - Zhang, Yuqing
N1 - Publisher Copyright:
© 2022 IEEE.
PY - 2022
Y1 - 2022
N2 - With the rapid development of software technology, the number of vulnerabilities is proliferating, which makes vulnerability detection an important topic of security research. Existing works only focus on predicting whether a given program code is vulnerable but less interpretable. To overcome these deficits, we first apply the hierarchical attention network into vulnerability detection for interpretable and fine-grained vulnerability discovery. Especially, our model consists of two level attention layers at both the line-level and the token-level of the code to locate which lines or tokens are important to discover vulnerabilities. Furthermore, in order to accurately extract features from source code, we process the code based on the abstract syntax tree and embed the syntax tokens into vectors. We evaluate the performance of our model on two widely used benchmark datasets, CWE-119 (Buffer Error) and CWE399 (Resource Management Error) from SARD. Experiments show that the F1 score of our model achieves 86.1% (CWE-119) and 90.0% (CWE-399) on two datasets, which is significantly better than the-state-of-the-art models. In particular, our model can directly mark the importance of different lines and different tokens, which can provide useful information for further vulnerability exploitation and repair.
AB - With the rapid development of software technology, the number of vulnerabilities is proliferating, which makes vulnerability detection an important topic of security research. Existing works only focus on predicting whether a given program code is vulnerable but less interpretable. To overcome these deficits, we first apply the hierarchical attention network into vulnerability detection for interpretable and fine-grained vulnerability discovery. Especially, our model consists of two level attention layers at both the line-level and the token-level of the code to locate which lines or tokens are important to discover vulnerabilities. Furthermore, in order to accurately extract features from source code, we process the code based on the abstract syntax tree and embed the syntax tokens into vectors. We evaluate the performance of our model on two widely used benchmark datasets, CWE-119 (Buffer Error) and CWE399 (Resource Management Error) from SARD. Experiments show that the F1 score of our model achieves 86.1% (CWE-119) and 90.0% (CWE-399) on two datasets, which is significantly better than the-state-of-the-art models. In particular, our model can directly mark the importance of different lines and different tokens, which can provide useful information for further vulnerability exploitation and repair.
UR - http://www.scopus.com/inward/record.url?scp=85133922122&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85133922122&partnerID=8YFLogxK
U2 - 10.1109/INFOCOMWKSHPS54753.2022.9798297
DO - 10.1109/INFOCOMWKSHPS54753.2022.9798297
M3 - Conference contribution
AN - SCOPUS:85133922122
T3 - INFOCOM WKSHPS 2022 - IEEE Conference on Computer Communications Workshops
BT - INFOCOM WKSHPS 2022 - IEEE Conference on Computer Communications Workshops
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2022 IEEE Conference on Computer Communications Workshops, INFOCOM WKSHPS 2022
Y2 - 2 May 2022 through 5 May 2022
ER -