Plagiarism Detection of Multi-threaded Programs Using Frequent Behavioral Pattern Mining

Zhenzhou Tian, Qing Wang, Cong Gao, Lingwei Chen, DInghao Wu

Research output: Contribution to journalArticlepeer-review

2 Scopus citations


Software dynamic birthmark techniques construct birthmarks using the captured execution traces from running the programs, which serve as one of the most promising methods for obfuscation-resilient software plagiarism detection. However, due to the perturbation caused by non-deterministic thread scheduling in multi-threaded programs, such dynamic approaches optimized for sequential programs may suffer from the randomness in multi-threaded program plagiarism detection. In this paper, we propose a new dynamic thread-aware birthmark FPBirth to facilitate multi-threaded program plagiarism detection. We first explore dynamic monitoring to capture multiple execution traces with respect to system calls for each multi-threaded program under a specified input, and then leverage the Apriori algorithm to mine frequent patterns to formulate our dynamic birthmark, which can not only depict the program's behavioral semantics, but also resist the changes and perturbations over execution traces caused by the thread scheduling in multi-threaded programs. Using FPBirth, we design a multi-threaded program plagiarism detection system. The experimental results based on a public software plagiarism sample set demonstrate that the developed system integrating our proposed birthmark FPBirth copes better with multi-threaded plagiarism detection than alternative approaches. Compared against the dynamic birthmark System Call Short Sequence Birthmark (SCSSB), FPBirth achieves 12.4%, 4.1% and 7.9% performance improvements with respect to union of resilience and credibility (URC), F-Measure and matthews correlation coefficient (MCC) metric, respectively.

Original languageEnglish (US)
Pages (from-to)1667-1688
Number of pages22
JournalInternational Journal of Software Engineering and Knowledge Engineering
Issue number11-12
StatePublished - Nov 2020

All Science Journal Classification (ASJC) codes

  • Software
  • Computer Networks and Communications
  • Computer Graphics and Computer-Aided Design
  • Artificial Intelligence


Dive into the research topics of 'Plagiarism Detection of Multi-threaded Programs Using Frequent Behavioral Pattern Mining'. Together they form a unique fingerprint.

Cite this