TY - GEN
T1 - Semantics-aware machine learning for function recognition in binary code
AU - Wang, Shuai
AU - Wang, Pei
AU - Wu, Dinghao
N1 - Funding Information:
We appreciate the anonymous reviewers for their valuable feedback. We also thank Tiffany Bao for helping us setup ByteWeight and providing valuable feedback. This research was supported in part by the National Science Foundation (NSF) under grant CNS-1652790, and the Office of Naval Research (ONR) under grants N00014-13-1-0175, N00014-16-1-2265, and N00014-16-1-2912.
Publisher Copyright:
© 2017 IEEE.
PY - 2017/11/2
Y1 - 2017/11/2
N2 - Function recognition in program binaries serves as the foundation for many binary instrumentation and analysis tasks. However, as binaries are usually stripped before distribution, function information is indeed absent in most binaries. By far, identifying functions in stripped binaries remains a challenge. Recent research work proposes to recognize functionsinbinary code through machine learning techniques. The recognition model, including typical function entry point patterns, is automatically constructed through learning. However, we observed that as previous work only leverages syntax-level features to train the model, binary obfuscation techniques can undermine the prelearned models in real-world usage scenarios. In this paper, we propose FID, a semantics-based method to recognize functions in stripped binaries. We leverage symbolic execution to generate semantic information and learn the function recognition model through well-performing machine learning techniques. FID extracts semantic information from binary code and, therefore, is effectively adapted to different compilers and optimizations. Moreover, we also demonstrate that FID has high recognition accuracy on binaries transformed by widely-used obfuscation techniques. We evaluate FID with over four thousand test cases. Our evaluation shows that FID is comparable with previous work on normal binaries and it notably outperforms existing tools on obfuscated code.
AB - Function recognition in program binaries serves as the foundation for many binary instrumentation and analysis tasks. However, as binaries are usually stripped before distribution, function information is indeed absent in most binaries. By far, identifying functions in stripped binaries remains a challenge. Recent research work proposes to recognize functionsinbinary code through machine learning techniques. The recognition model, including typical function entry point patterns, is automatically constructed through learning. However, we observed that as previous work only leverages syntax-level features to train the model, binary obfuscation techniques can undermine the prelearned models in real-world usage scenarios. In this paper, we propose FID, a semantics-based method to recognize functions in stripped binaries. We leverage symbolic execution to generate semantic information and learn the function recognition model through well-performing machine learning techniques. FID extracts semantic information from binary code and, therefore, is effectively adapted to different compilers and optimizations. Moreover, we also demonstrate that FID has high recognition accuracy on binaries transformed by widely-used obfuscation techniques. We evaluate FID with over four thousand test cases. Our evaluation shows that FID is comparable with previous work on normal binaries and it notably outperforms existing tools on obfuscated code.
UR - https://www.scopus.com/pages/publications/85037094010
UR - https://www.scopus.com/inward/citedby.url?scp=85037094010&partnerID=8YFLogxK
U2 - 10.1109/ICSME.2017.59
DO - 10.1109/ICSME.2017.59
M3 - Conference contribution
AN - SCOPUS:85037094010
T3 - Proceedings - 2017 IEEE International Conference on Software Maintenance and Evolution, ICSME 2017
SP - 388
EP - 398
BT - Proceedings - 2017 IEEE International Conference on Software Maintenance and Evolution, ICSME 2017
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2017 IEEE International Conference on Software Maintenance and Evolution, ICSME 2017
Y2 - 19 September 2017 through 22 September 2017
ER -