TY - GEN
T1 - In-memory fuzzing for binary code similarity analysis
AU - Wang, Shuai
AU - Wu, Dinghao
N1 - Funding Information:
We thank the ASE anonymous reviewers and Tiffany Bao for their valuable feedback. This research was supported in part by the National Science Foundation (NSF) under grant CNS-1652790, and the Office of Naval Research (ONR) under grants N00014-16-1-2265, N00014-16-1-2912, and N00014- 17-1-2894
Publisher Copyright:
© 2017 IEEE.
PY - 2017/11/20
Y1 - 2017/11/20
N2 - Detecting similar functions in binary executables serves as a foundation for many binary code analysis and reuse tasks. By far, recognizing similar components in binary code remains a challenge. Existing research employs either static or dynamic approaches to capture program syntax or semantics-level features for comparison. However, there exist multiple design limitations in previous work, which result in relatively high cost, low accuracy and scalability, and thus severely impede their practical use. In this paper, we present a novel method that leverages in-memory fuzzing for binary code similarity analysis. Our prototype tool IMF-SIM applies in-memory fuzzing to launch analysis towards every function and collect traces of different kinds of program behaviors. The similarity score of two behavior traces is computed according to their longest common subsequence. To compare two functions, a feature vector is generated, whose elements are the similarity scores of the behavior trace-level comparisons. We train a machine learning model through labeled feature vectors; later, for a given feature vector by comparing two functions, the trained model gives a final score, representing the similarity score of the two functions. We evaluate IMF-SIM against binaries compiled by different compilers, optimizations, and commonly-used obfuscation methods, in total over one thousand binary executables. Our evaluation shows that IMF-SIM notably outperforms existing tools with higher accuracy and broader application scopes.
AB - Detecting similar functions in binary executables serves as a foundation for many binary code analysis and reuse tasks. By far, recognizing similar components in binary code remains a challenge. Existing research employs either static or dynamic approaches to capture program syntax or semantics-level features for comparison. However, there exist multiple design limitations in previous work, which result in relatively high cost, low accuracy and scalability, and thus severely impede their practical use. In this paper, we present a novel method that leverages in-memory fuzzing for binary code similarity analysis. Our prototype tool IMF-SIM applies in-memory fuzzing to launch analysis towards every function and collect traces of different kinds of program behaviors. The similarity score of two behavior traces is computed according to their longest common subsequence. To compare two functions, a feature vector is generated, whose elements are the similarity scores of the behavior trace-level comparisons. We train a machine learning model through labeled feature vectors; later, for a given feature vector by comparing two functions, the trained model gives a final score, representing the similarity score of the two functions. We evaluate IMF-SIM against binaries compiled by different compilers, optimizations, and commonly-used obfuscation methods, in total over one thousand binary executables. Our evaluation shows that IMF-SIM notably outperforms existing tools with higher accuracy and broader application scopes.
UR - http://www.scopus.com/inward/record.url?scp=85041433639&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85041433639&partnerID=8YFLogxK
U2 - 10.1109/ASE.2017.8115645
DO - 10.1109/ASE.2017.8115645
M3 - Conference contribution
AN - SCOPUS:85041433639
T3 - ASE 2017 - Proceedings of the 32nd IEEE/ACM International Conference on Automated Software Engineering
SP - 319
EP - 330
BT - ASE 2017 - Proceedings of the 32nd IEEE/ACM International Conference on Automated Software Engineering
A2 - Nguyen, Tien N.
A2 - Rosu, Grigore
A2 - Di Penta, Massimiliano
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 32nd IEEE/ACM International Conference on Automated Software Engineering, ASE 2017
Y2 - 30 October 2017 through 3 November 2017
ER -