In-memory fuzzing for binary code similarity analysis

Shuai Wang, Dinghao Wu

Research output: Chapter in Book/Report/Conference proceedingConference contribution

40 Scopus citations

Abstract

Detecting similar functions in binary executables serves as a foundation for many binary code analysis and reuse tasks. By far, recognizing similar components in binary code remains a challenge. Existing research employs either static or dynamic approaches to capture program syntax or semantics-level features for comparison. However, there exist multiple design limitations in previous work, which result in relatively high cost, low accuracy and scalability, and thus severely impede their practical use. In this paper, we present a novel method that leverages in-memory fuzzing for binary code similarity analysis. Our prototype tool IMF-SIM applies in-memory fuzzing to launch analysis towards every function and collect traces of different kinds of program behaviors. The similarity score of two behavior traces is computed according to their longest common subsequence. To compare two functions, a feature vector is generated, whose elements are the similarity scores of the behavior trace-level comparisons. We train a machine learning model through labeled feature vectors; later, for a given feature vector by comparing two functions, the trained model gives a final score, representing the similarity score of the two functions. We evaluate IMF-SIM against binaries compiled by different compilers, optimizations, and commonly-used obfuscation methods, in total over one thousand binary executables. Our evaluation shows that IMF-SIM notably outperforms existing tools with higher accuracy and broader application scopes.

Original languageEnglish (US)
Title of host publicationASE 2017 - Proceedings of the 32nd IEEE/ACM International Conference on Automated Software Engineering
EditorsTien N. Nguyen, Grigore Rosu, Massimiliano Di Penta
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages319-330
Number of pages12
ISBN (Electronic)9781538626849
DOIs
StatePublished - Nov 20 2017
Event32nd IEEE/ACM International Conference on Automated Software Engineering, ASE 2017 - Urbana-Champaign, United States
Duration: Oct 30 2017Nov 3 2017

Publication series

NameASE 2017 - Proceedings of the 32nd IEEE/ACM International Conference on Automated Software Engineering

Other

Other32nd IEEE/ACM International Conference on Automated Software Engineering, ASE 2017
Country/TerritoryUnited States
CityUrbana-Champaign
Period10/30/1711/3/17

All Science Journal Classification (ASJC) codes

  • Artificial Intelligence
  • Software
  • Control and Optimization

Fingerprint

Dive into the research topics of 'In-memory fuzzing for binary code similarity analysis'. Together they form a unique fingerprint.

Cite this