TY - GEN
T1 - Evaluating Large Language Models for Real-World Vulnerability Repair in C/C++ Code
AU - Zhang, Lan
AU - Zou, Qingtian
AU - Singhal, Anoop
AU - Sun, Xiaoyan
AU - Liu, Peng
N1 - Publisher Copyright:
© 2024 Owner/Author.
PY - 2024/6/21
Y1 - 2024/6/21
N2 - The advent of Large Language Models (LLMs) has enabled advancement in automated code generation, translation, and summarization. Despite their promise, evaluating the use of LLMs in repairing real-world code vulnerabilities remains underexplored. In this study, we address this gap by evaluating the capability of advanced LLMs, such as ChatGPT-4 and Claude, in fixing memory corruption vulnerabilities in real-world C/C++ code. We meticulously curated 223 real-world C/C++ code snippets encompassing a spectrum of memory corruption vulnerabilities, ranging from straightforward memory leaks to intricate buffer errors. Our findings demonstrate the proficiency of LLMs in rectifying simple memor errors like leaks, where fixes are confined to localized code segments. However, their effectiveness diminishes when addressing complicated vulnerabilities necessitating reasoning about cross-cutting concerns and deeper program semantics. Furthermore, we explore techniques for augmenting LLM performance by incorporating additional knowledge. Our results shed light on both the strengths and limitations of LLMs in automated program repair on genuine code, underscoring the need for advancements in reasoning abilities for handling complex code repair tasks.
AB - The advent of Large Language Models (LLMs) has enabled advancement in automated code generation, translation, and summarization. Despite their promise, evaluating the use of LLMs in repairing real-world code vulnerabilities remains underexplored. In this study, we address this gap by evaluating the capability of advanced LLMs, such as ChatGPT-4 and Claude, in fixing memory corruption vulnerabilities in real-world C/C++ code. We meticulously curated 223 real-world C/C++ code snippets encompassing a spectrum of memory corruption vulnerabilities, ranging from straightforward memory leaks to intricate buffer errors. Our findings demonstrate the proficiency of LLMs in rectifying simple memor errors like leaks, where fixes are confined to localized code segments. However, their effectiveness diminishes when addressing complicated vulnerabilities necessitating reasoning about cross-cutting concerns and deeper program semantics. Furthermore, we explore techniques for augmenting LLM performance by incorporating additional knowledge. Our results shed light on both the strengths and limitations of LLMs in automated program repair on genuine code, underscoring the need for advancements in reasoning abilities for handling complex code repair tasks.
UR - http://www.scopus.com/inward/record.url?scp=85197506324&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85197506324&partnerID=8YFLogxK
U2 - 10.1145/3643651.3659892
DO - 10.1145/3643651.3659892
M3 - Conference contribution
AN - SCOPUS:85197506324
T3 - IWSPA 2024 - Proceedings of the 10th ACM International Workshop on Security and Privacy Analytics
SP - 49
EP - 58
BT - IWSPA 2024 - Proceedings of the 10th ACM International Workshop on Security and Privacy Analytics
PB - Association for Computing Machinery, Inc
T2 - 10th ACM International Workshop on Security and Privacy Analytics, IWSPA 2024
Y2 - 21 June 2024
ER -