TY - GEN
T1 - Semantics-based obfuscation-resilient binary code similarity comparison with applications to software plagiarism detection
AU - Luo, Lannan
AU - Ming, Jiang
AU - Wu, Dinghao
AU - Liu, Peng
AU - Zhu, Sencun
N1 - Publisher Copyright:
Copyright 2014 ACM.
PY - 2014/11/16
Y1 - 2014/11/16
N2 - Existing code similarity comparison methods, whether source or binary code based, are mostly not resilient to obfuscations. In the case of software plagiarism, emerging obfuscation techniques have made automated detection increasingly difficult. In this paper, we propose a binary-oriented, obfuscationresilient method based on a new concept, longest common subsequence of semantically equivalent basic blocks, which combines rigorous program semantics with longest common subsequence based fuzzy matching. We model the semantics of a basic block by a set of symbolic formulas representing the input-output relations of the block. This way, the semantics equivalence (and similarity) of two blocks can be checked by a theorem prover. We then model the semantics similarity of two paths using the longest common subsequence with basic blocks as elements. This novel combination has resulted in strong resiliency to code obfuscation. We have developed a prototype and our experimental results show that our method is effective and practical when applied to real-world software.
AB - Existing code similarity comparison methods, whether source or binary code based, are mostly not resilient to obfuscations. In the case of software plagiarism, emerging obfuscation techniques have made automated detection increasingly difficult. In this paper, we propose a binary-oriented, obfuscationresilient method based on a new concept, longest common subsequence of semantically equivalent basic blocks, which combines rigorous program semantics with longest common subsequence based fuzzy matching. We model the semantics of a basic block by a set of symbolic formulas representing the input-output relations of the block. This way, the semantics equivalence (and similarity) of two blocks can be checked by a theorem prover. We then model the semantics similarity of two paths using the longest common subsequence with basic blocks as elements. This novel combination has resulted in strong resiliency to code obfuscation. We have developed a prototype and our experimental results show that our method is effective and practical when applied to real-world software.
UR - http://www.scopus.com/inward/record.url?scp=84986919852&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84986919852&partnerID=8YFLogxK
U2 - 10.1145/2635868.2635900
DO - 10.1145/2635868.2635900
M3 - Conference contribution
AN - SCOPUS:84986919852
T3 - Proceedings of the ACM SIGSOFT Symposium on the Foundations of Software Engineering
SP - 389
EP - 400
BT - 22nd ACM SIGSOFT International Symposium on the Foundations of Software Engineering, FSE 2014 - Proceedings
PB - Association for Computing Machinery
T2 - 22nd ACM SIGSOFT International Symposium on the Foundations of Software Engineering, FSE 2014
Y2 - 16 November 2014 through 21 November 2014
ER -