TY - JOUR
T1 - Achieving accuracy and scalability simultaneously in detecting application clones on Android markets
AU - Chen, Kai
AU - Liu, Peng
AU - Zhang, Yingjun
N1 - Publisher Copyright:
© 2014 ACM.
PY - 2014/5/31
Y1 - 2014/5/31
N2 - Besides traditional problems such as potential bugs, (smartphone) application clones on Android markets bring new threats. That is, attackers clone the code from legitimate Android applications, assemble it with malicious code or advertisements, and publish these ''purpose-added" app clones on the same or other markets for benefits. Three inherent and unique characteristics make app clones difficult to detect by existing techniques: a billion opcode problem caused by cross-market publishing, gap between code clones and app clones, and prevalent Type 2 and Type 3 clones. Existing techniques achieve either accuracy or scalability, but not both. To achieve both goals, we use a geometry characteristic, called centroid, of dependency graphs to measure the similarity between methods (code fragments) in two apps. Then we synthesize the method-level similarities and draw a Y/N conclusion on app (core functionality) cloning. The observed ''centroid effect" and the inherent ''monotonicity" property enable our approach to achieve both high accuracy and scalability. We implemented the app clone detection system and evaluated it on five whole Android markets (including 150,145 apps, 203 million methods and 26 billion opcodes). It takes less than one hour to perform cross-market app clone detection on the five markets after generating centroids only once.
AB - Besides traditional problems such as potential bugs, (smartphone) application clones on Android markets bring new threats. That is, attackers clone the code from legitimate Android applications, assemble it with malicious code or advertisements, and publish these ''purpose-added" app clones on the same or other markets for benefits. Three inherent and unique characteristics make app clones difficult to detect by existing techniques: a billion opcode problem caused by cross-market publishing, gap between code clones and app clones, and prevalent Type 2 and Type 3 clones. Existing techniques achieve either accuracy or scalability, but not both. To achieve both goals, we use a geometry characteristic, called centroid, of dependency graphs to measure the similarity between methods (code fragments) in two apps. Then we synthesize the method-level similarities and draw a Y/N conclusion on app (core functionality) cloning. The observed ''centroid effect" and the inherent ''monotonicity" property enable our approach to achieve both high accuracy and scalability. We implemented the app clone detection system and evaluated it on five whole Android markets (including 150,145 apps, 203 million methods and 26 billion opcodes). It takes less than one hour to perform cross-market app clone detection on the five markets after generating centroids only once.
UR - http://www.scopus.com/inward/record.url?scp=84994101812&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84994101812&partnerID=8YFLogxK
U2 - 10.1145/2568225.2568286
DO - 10.1145/2568225.2568286
M3 - Conference article
AN - SCOPUS:84994101812
SN - 0270-5257
SP - 175
EP - 186
JO - Proceedings - International Conference on Software Engineering
JF - Proceedings - International Conference on Software Engineering
IS - 1
T2 - 36th International Conference on Software Engineering, ICSE 2014
Y2 - 31 May 2014 through 7 June 2014
ER -