TY - JOUR
T1 - Learning from the ones that got away
T2 - Detecting new forms of phishing attacks
AU - Gutierrez, Christopher N.
AU - Kim, Taegyu
AU - Corte, Raffaele Della
AU - Avery, Jeffrey
AU - Goldwasser, Dan
AU - Cinque, Marcello
AU - Bagchi, Saurabh
N1 - Funding Information:
The authors would like to thank Paul Wood (Purdue), Keith McDermott and Brian Berndt (Information Technology at Purdue - ITaP), and Jonathan Fulkerson (Northrop Grum-man) for their valuable help for this paper. This material is in part supported by the US National Science Foundation (Grant Number CNS-1548114) and Northrop Grumman through their Cybersecurity Research Consortium. Any findings and conclusions expressed in this paper are those of the authors and do not necessarily reflect the views of the sponsors.
Publisher Copyright:
© 2004-2012 IEEE.
PY - 2018/11/1
Y1 - 2018/11/1
N2 - Phishing attacks continue to pose a major threat for computer system defenders, often forming the first step in a multi-stage attack. There have been great strides made in phishing detection; however, some phishing emails appear to pass through filters by making simple structural and semantic changes to the messages. We tackle this problem through the use of a machine learning classifier operating on a large corpus of phishing and legitimate emails. We design SAFe-PC (Semi-Automated Feature generation for Phish Classification), a system to extract features, elevating some to higher level features, that are meant to defeat common phishing email detection strategies. To evaluate SAFe-PC , we collect a large corpus of phishing emails from the central IT organization at a tier-1 university. The execution of SAFe-PC on the dataset exposes hitherto unknown insights on phishing campaigns directed at university users. SAFe-PC detects more than 70 percent of the emails that had eluded our production deployment of Sophos, a state-of-the-art email filtering tool. It also outperforms SpamAssassin, a commonly used email filtering tool. We also developed an online version of SAFe-PC, that can be incrementally retrained with new samples. Its detection performance improves with time as new samples are collected, while the time to retrain the classifier stays constant.
AB - Phishing attacks continue to pose a major threat for computer system defenders, often forming the first step in a multi-stage attack. There have been great strides made in phishing detection; however, some phishing emails appear to pass through filters by making simple structural and semantic changes to the messages. We tackle this problem through the use of a machine learning classifier operating on a large corpus of phishing and legitimate emails. We design SAFe-PC (Semi-Automated Feature generation for Phish Classification), a system to extract features, elevating some to higher level features, that are meant to defeat common phishing email detection strategies. To evaluate SAFe-PC , we collect a large corpus of phishing emails from the central IT organization at a tier-1 university. The execution of SAFe-PC on the dataset exposes hitherto unknown insights on phishing campaigns directed at university users. SAFe-PC detects more than 70 percent of the emails that had eluded our production deployment of Sophos, a state-of-the-art email filtering tool. It also outperforms SpamAssassin, a commonly used email filtering tool. We also developed an online version of SAFe-PC, that can be incrementally retrained with new samples. Its detection performance improves with time as new samples are collected, while the time to retrain the classifier stays constant.
UR - http://www.scopus.com/inward/record.url?scp=85051825990&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85051825990&partnerID=8YFLogxK
U2 - 10.1109/TDSC.2018.2864993
DO - 10.1109/TDSC.2018.2864993
M3 - Article
AN - SCOPUS:85051825990
SN - 1545-5971
VL - 15
SP - 988
EP - 1001
JO - IEEE Transactions on Dependable and Secure Computing
JF - IEEE Transactions on Dependable and Secure Computing
IS - 6
M1 - 8440723
ER -