Semantics-based obfuscation-resilient binary code similarity comparison with applications to software plagiarism detection

Lannan Luo, Jiang Ming, Dinghao Wu, Peng Liu, Sencun Zhu

Research output: Chapter in Book/Report/Conference proceedingConference contribution

178 Scopus citations

Abstract

Existing code similarity comparison methods, whether source or binary code based, are mostly not resilient to obfuscations. In the case of software plagiarism, emerging obfuscation techniques have made automated detection increasingly difficult. In this paper, we propose a binary-oriented, obfuscationresilient method based on a new concept, longest common subsequence of semantically equivalent basic blocks, which combines rigorous program semantics with longest common subsequence based fuzzy matching. We model the semantics of a basic block by a set of symbolic formulas representing the input-output relations of the block. This way, the semantics equivalence (and similarity) of two blocks can be checked by a theorem prover. We then model the semantics similarity of two paths using the longest common subsequence with basic blocks as elements. This novel combination has resulted in strong resiliency to code obfuscation. We have developed a prototype and our experimental results show that our method is effective and practical when applied to real-world software.

Original languageEnglish (US)
Title of host publication22nd ACM SIGSOFT International Symposium on the Foundations of Software Engineering, FSE 2014 - Proceedings
PublisherAssociation for Computing Machinery
Pages389-400
Number of pages12
ISBN (Electronic)9781450330565
DOIs
StatePublished - Nov 16 2014
Event22nd ACM SIGSOFT International Symposium on the Foundations of Software Engineering, FSE 2014 - Hong Kong, China
Duration: Nov 16 2014Nov 21 2014

Publication series

NameProceedings of the ACM SIGSOFT Symposium on the Foundations of Software Engineering
Volume16-21-November-2014

Other

Other22nd ACM SIGSOFT International Symposium on the Foundations of Software Engineering, FSE 2014
Country/TerritoryChina
CityHong Kong
Period11/16/1411/21/14

All Science Journal Classification (ASJC) codes

  • Software

Fingerprint

Dive into the research topics of 'Semantics-based obfuscation-resilient binary code similarity comparison with applications to software plagiarism detection'. Together they form a unique fingerprint.

Cite this