TY - JOUR
T1 - Deviation-based obfuscation-resilient program equivalence checking with application to software plagiarism detection
AU - Ming, Jiang
AU - Zhang, Fangfang
AU - Wu, Dinghao
AU - Liu, Peng
AU - Zhu, Sencun
N1 - Publisher Copyright:
© 1963-2012 IEEE.
PY - 2016/12
Y1 - 2016/12
N2 - Software plagiarism, an act of illegally copying others' code, has become a serious concern for honest software companies and the open source community. Considerable research efforts have been dedicated to searching the evidence of software plagiarism. In this paper, we continue this line of research and propose LoPD, a deviation-based program equivalence checking approach, which is an ideal fit for the whole-program plagiarism detection. Instead of directly comparing the similarity between two programs, LoPD searches for any dissimilarity between two programs by finding an input that will cause these two programs to behave differently, either with different output states or with semantically different execution paths. As long as we can find one dissimilarity, the programs are semantically different; but if we cannot find any dissimilarity, it is more likely a plagiarism case. We leverage dynamic symbolic execution to capture the semantics of execution paths and to find path deviations. Compared to the existing detection approaches, LoPD's formal program semantics-based method is more resilient to automatic obfuscation schemes. Our evaluation results indicate that LoPD is effective in detecting whole-program plagiarism. Furthermore, we demonstrate that LoPD can be applied to partial software plagiarism detection as well. The encouraging experiment results show that LoPD is an appealing complement to existing software plagiarism detection approaches.
AB - Software plagiarism, an act of illegally copying others' code, has become a serious concern for honest software companies and the open source community. Considerable research efforts have been dedicated to searching the evidence of software plagiarism. In this paper, we continue this line of research and propose LoPD, a deviation-based program equivalence checking approach, which is an ideal fit for the whole-program plagiarism detection. Instead of directly comparing the similarity between two programs, LoPD searches for any dissimilarity between two programs by finding an input that will cause these two programs to behave differently, either with different output states or with semantically different execution paths. As long as we can find one dissimilarity, the programs are semantically different; but if we cannot find any dissimilarity, it is more likely a plagiarism case. We leverage dynamic symbolic execution to capture the semantics of execution paths and to find path deviations. Compared to the existing detection approaches, LoPD's formal program semantics-based method is more resilient to automatic obfuscation schemes. Our evaluation results indicate that LoPD is effective in detecting whole-program plagiarism. Furthermore, we demonstrate that LoPD can be applied to partial software plagiarism detection as well. The encouraging experiment results show that LoPD is an appealing complement to existing software plagiarism detection approaches.
UR - http://www.scopus.com/inward/record.url?scp=84974829842&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84974829842&partnerID=8YFLogxK
U2 - 10.1109/TR.2016.2570554
DO - 10.1109/TR.2016.2570554
M3 - Article
AN - SCOPUS:84974829842
SN - 0018-9529
VL - 65
SP - 1647
EP - 1664
JO - IEEE Transactions on Reliability
JF - IEEE Transactions on Reliability
IS - 4
M1 - 7490384
ER -