Binary code analysis is very attractive from a security viewpoint. First, in many tasks such as malware analysis, the source code of the program under examination is often absent, and the analysis has to be done on binary code. Second, even the source code is available, binary analysis allows us to reason about the real instructions executed on hardware and avoid the well-known WYSINWYX problem, What You See Is Not What You Execute. Third, some program behaviors, such as cache access patterns, are only exhibited in the low-level code. On the other hand, binary code analysis is faced with an increasing challenge caused by the emerging, readily available code obfuscation techniques. Traditional signature-based malware detection is often problematic as it relies on file hashes and byte (or instruction) signatures which are not very resilient to obfuscation.
This project tackles the challenge by proposing several advanced methods that combine techniques from the behavior and semantics perspectives. Two new concepts, System Call Sliced Segment Equivalence Checking and N-gram Basic Block Semantics Memoization, are proposed to achieve better obfuscation resiliency and scalability. Compared with the existing approaches, these methods are based on the strong principles of program semantics and logics, more resilient to automatic obfuscation schemes, and more scalable with the proposed advanced semantics memoization techniques. In addition, the application is extended to side-channel detection with a new rigorous model. Upon completion, the project will make a significant contribution to binary code analysis in general. It will advance the state of the art of malware analysis and side-channel detection and help better defend cyber attacks, leading to more secure cyber space. Broader impact will also result from the education and dissemination initiatives.
|Effective start/end date
|4/1/17 → 3/31/23
- National Science Foundation: $509,103.00