With the thriving of the Android ecosystem, codes are widely reused in Android apps in the form of third-party libraries. Recent research shows that emerging third-party libraries may introduce a lot of privacy risks and other security threats. Nevertheless, current approaches on libraries identification are far away from the demand for accuracy and efficiency. In this article, we present LibHawkeye, a new clustering-based technique to identify third-party libraries in millions of Android apps. Our approach utilizes four different kinds of dependencies inside Android apps to build intra-app dependency graphs but discards package homogeny which is heavily depended upon by most previous works. What's more, we propose three steps of refinement to eliminate false positives in the initial result as much as possible. The experiment on 1,000 apps reports that compared to existing tools, LibHawkeye can precisely identify at least 26.5 percent more libraries. We also evaluate it with 3,987,206 Android apps published in Google Play, and the accuracy of sampled libraries from the clustering result is 93.25 percent. Results show that LibHawkeye significantly outperforms the state-of-the-art tools without loss of scalability.
All Science Journal Classification (ASJC) codes
- Information Systems
- Information Systems and Management