TY - GEN
T1 - Combining static analysis with probabilistic models to enable market-scale android inter-component analysis
AU - Octeau, Damien
AU - Jha, Somesh
AU - Dering, Matthew
AU - McDaniel, Patrick
AU - Bartel, Alexandre
AU - Li, Li
AU - Klein, Jacques
AU - Traon, Yves Le
N1 - Publisher Copyright:
© 2016 ACM.
PY - 2016/1/11
Y1 - 2016/1/11
N2 - Static analysis has been successfully used in many areas, from verifying mission-critical software to malware detection. Unfortunately, static analysis often produces false positives, which require significant manual effort to resolve. In this paper, we show how to overlay a probabilistic model, trained using domain knowledge, on top of static analysis results, in order to triage static analysis results. We apply this idea to analyzing mobile applications. Android application components can communicate with each other, both within single applications and between different applications. Unfortunately, techniques to statically infer Inter-Component Communication (ICC) yield many potential inter-component and interapplication links, most of which are false positives. At large scales, scrutinizing all potential links is simply not feasible. We therefore overlay a probabilistic model of ICC on top of static analysis results. Since computing the inter-component links is a prerequisite to inter-component analysis, we introduce a formalism for inferring ICC links based on set constraints.We design an efficient algorithm for performing link resolution. We compute all potential links in a corpus of 11, 267 applications in 30 minutes and triage them using our probabilistic approach. We find that over 95.1% of all 636 million potential links are associated with probability values below 0.01 and are thus likely unfeasible links. Thus, it is possible to consider only a small subset of all links without significant loss of information. This work is the first significant step in making static inter-application analysis more tractable, even at large scales.
AB - Static analysis has been successfully used in many areas, from verifying mission-critical software to malware detection. Unfortunately, static analysis often produces false positives, which require significant manual effort to resolve. In this paper, we show how to overlay a probabilistic model, trained using domain knowledge, on top of static analysis results, in order to triage static analysis results. We apply this idea to analyzing mobile applications. Android application components can communicate with each other, both within single applications and between different applications. Unfortunately, techniques to statically infer Inter-Component Communication (ICC) yield many potential inter-component and interapplication links, most of which are false positives. At large scales, scrutinizing all potential links is simply not feasible. We therefore overlay a probabilistic model of ICC on top of static analysis results. Since computing the inter-component links is a prerequisite to inter-component analysis, we introduce a formalism for inferring ICC links based on set constraints.We design an efficient algorithm for performing link resolution. We compute all potential links in a corpus of 11, 267 applications in 30 minutes and triage them using our probabilistic approach. We find that over 95.1% of all 636 million potential links are associated with probability values below 0.01 and are thus likely unfeasible links. Thus, it is possible to consider only a small subset of all links without significant loss of information. This work is the first significant step in making static inter-application analysis more tractable, even at large scales.
UR - https://www.scopus.com/pages/publications/84962580740
UR - https://www.scopus.com/pages/publications/84962580740#tab=citedBy
U2 - 10.1145/2837614.2837661
DO - 10.1145/2837614.2837661
M3 - Conference contribution
AN - SCOPUS:84962580740
T3 - Conference Record of the Annual ACM Symposium on Principles of Programming Languages
SP - 469
EP - 484
BT - POPL 2016 - Proceedings of the 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages
A2 - Majumdar, Rupak
A2 - Bodik, Rastislav
PB - Association for Computing Machinery
T2 - 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL 2016
Y2 - 20 January 2016 through 22 January 2016
ER -