TY - GEN
T1 - Advanced Windows Methods on Malware Detection and Classification
AU - Rabadi, Dima
AU - Teo, Sin G.
N1 - Publisher Copyright:
© 2020 ACM.
PY - 2020/12/7
Y1 - 2020/12/7
N2 - Application Programming Interfaces (APIs) are still considered the standard accessible data source and core wok of the most widely adopted malware detection and classification techniques. API-based malware detectors highly rely on measuring API's statistical features, such as calculating the frequency counter of calling specific API calls or finding their malicious sequence pattern (i.e., signature-based detectors). Using simple hooking tools, malware authors would help in failing such detectors by interrupting the sequence and shuffling the API calls or deleting/inserting irrelevant calls (i.e., changing the frequency counter). Moreover, relying on API calls (e.g., function names) alone without taking into account their function parameters is insufficient to understand the purpose of the program. For example, the same API call (e.g., writing on a file) would act in two ways if two different arguments are passed (e.g., writing on a system versus user file). However, because of the heterogeneous nature of API arguments, most of the available API-based malicious behavior detectors would consider only the API calls without taking into account their argument information (e.g., function parameters). Alternatively, other detectors try considering the API arguments in their techniques, but they acquire having proficient knowledge about the API arguments or powerful processors to extract them. Such requirements demand a prohibitive cost and complex operations to deal with the arguments. To overcome the above limitations, with the help of machine learning and without any expert knowledge of the arguments, we propose a light-weight API-based dynamic feature extraction technique, and we use it to implement a malware detection and type classification approach. To evaluate our approach, we use reasonable datasets of 7774 benign and 7105 malicious samples belonging to ten distinct malware types. Experimental results show that our type classification module could achieve an accuracy of , where our malware detection module could reach an accuracy of over , and outperforms many state-of-the-art API-based malware detectors.
AB - Application Programming Interfaces (APIs) are still considered the standard accessible data source and core wok of the most widely adopted malware detection and classification techniques. API-based malware detectors highly rely on measuring API's statistical features, such as calculating the frequency counter of calling specific API calls or finding their malicious sequence pattern (i.e., signature-based detectors). Using simple hooking tools, malware authors would help in failing such detectors by interrupting the sequence and shuffling the API calls or deleting/inserting irrelevant calls (i.e., changing the frequency counter). Moreover, relying on API calls (e.g., function names) alone without taking into account their function parameters is insufficient to understand the purpose of the program. For example, the same API call (e.g., writing on a file) would act in two ways if two different arguments are passed (e.g., writing on a system versus user file). However, because of the heterogeneous nature of API arguments, most of the available API-based malicious behavior detectors would consider only the API calls without taking into account their argument information (e.g., function parameters). Alternatively, other detectors try considering the API arguments in their techniques, but they acquire having proficient knowledge about the API arguments or powerful processors to extract them. Such requirements demand a prohibitive cost and complex operations to deal with the arguments. To overcome the above limitations, with the help of machine learning and without any expert knowledge of the arguments, we propose a light-weight API-based dynamic feature extraction technique, and we use it to implement a malware detection and type classification approach. To evaluate our approach, we use reasonable datasets of 7774 benign and 7105 malicious samples belonging to ten distinct malware types. Experimental results show that our type classification module could achieve an accuracy of , where our malware detection module could reach an accuracy of over , and outperforms many state-of-the-art API-based malware detectors.
UR - http://www.scopus.com/inward/record.url?scp=85098055299&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85098055299&partnerID=8YFLogxK
U2 - 10.1145/3427228.3427242
DO - 10.1145/3427228.3427242
M3 - Conference contribution
AN - SCOPUS:85098055299
T3 - ACM International Conference Proceeding Series
SP - 54
EP - 68
BT - Proceedings - 36th Annual Computer Security Applications Conference, ACSAC 2020
PB - Association for Computing Machinery
T2 - 36th Annual Computer Security Applications Conference, ACSAC 2020
Y2 - 7 December 2020 through 11 December 2020
ER -