Project Details


Software plagiarism is an act of reusing someone else's code, in whole or in part, into one own program in a way violating the terms of original license. Along with the rapid developing software industry and the burst of open source projects, software plagiarism has become a very serious threat to Intellectual Property Protection and the 'healthiness' of the open-source-embracing software industry. Meanwhile, software plagiarism (and what is called app repackaging) has become an even more common phenomenon in the mobile app markets for monetary profit or propagation of malware by inserting malicious payloads into the original apps. To address this threat, computer-aided, automated plagiarism detection tools should play a major role. However, existing plagiarism detection schemes are not resilient to code obfuscation, and often they can be defeated by (in most cases rather simple) code-obfuscation-based counter-detection tools which are freely available.

In this project, the software plagiarism detection problem is studied in a systematic way. The proposed plagiarism detection methods for PC applications leverage program logic and longest semantically-equivalent-basic-block subsequences. They are capable of detecting partial program plagiarism and also provide formal guarantee on obfuscation resilience. The proposed method for mobile apps exploits user interface for plagiarism detection, and this unique design angle empowers it to defeat various code obfuscation techniques. Our proposed research will significantly deter the intention or practice of software plagiarism. It will not only serve as a useful tool in collecting strong plagiarism evidences for lawsuits related to intellectual property, but also promote a more healthy and trustworthy sharing environment for the open source community and for the mobile app markets. Broader impact will also result from the education and dissemination initiatives.

Effective start/end date8/1/137/31/17


  • National Science Foundation: $500,000.00


Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.