TY - JOUR
T1 - Learning from the ones that got away
T2 - Detecting new forms of phishing attacks
AU - Gutierrez, Christopher N.
AU - Kim, Taegyu
AU - Corte, Raffaele Della
AU - Avery, Jeffrey
AU - Goldwasser, Dan
AU - Cinque, Marcello
AU - Bagchi, Saurabh
N1 - Publisher Copyright:
© 2004-2012 IEEE.
PY - 2018/11/1
Y1 - 2018/11/1
N2 - Phishing attacks continue to pose a major threat for computer system defenders, often forming the first step in a multi-stage attack. There have been great strides made in phishing detection; however, some phishing emails appear to pass through filters by making simple structural and semantic changes to the messages. We tackle this problem through the use of a machine learning classifier operating on a large corpus of phishing and legitimate emails. We design SAFe-PC (Semi-Automated Feature generation for Phish Classification), a system to extract features, elevating some to higher level features, that are meant to defeat common phishing email detection strategies. To evaluate SAFe-PC , we collect a large corpus of phishing emails from the central IT organization at a tier-1 university. The execution of SAFe-PC on the dataset exposes hitherto unknown insights on phishing campaigns directed at university users. SAFe-PC detects more than 70 percent of the emails that had eluded our production deployment of Sophos, a state-of-the-art email filtering tool. It also outperforms SpamAssassin, a commonly used email filtering tool. We also developed an online version of SAFe-PC, that can be incrementally retrained with new samples. Its detection performance improves with time as new samples are collected, while the time to retrain the classifier stays constant.
AB - Phishing attacks continue to pose a major threat for computer system defenders, often forming the first step in a multi-stage attack. There have been great strides made in phishing detection; however, some phishing emails appear to pass through filters by making simple structural and semantic changes to the messages. We tackle this problem through the use of a machine learning classifier operating on a large corpus of phishing and legitimate emails. We design SAFe-PC (Semi-Automated Feature generation for Phish Classification), a system to extract features, elevating some to higher level features, that are meant to defeat common phishing email detection strategies. To evaluate SAFe-PC , we collect a large corpus of phishing emails from the central IT organization at a tier-1 university. The execution of SAFe-PC on the dataset exposes hitherto unknown insights on phishing campaigns directed at university users. SAFe-PC detects more than 70 percent of the emails that had eluded our production deployment of Sophos, a state-of-the-art email filtering tool. It also outperforms SpamAssassin, a commonly used email filtering tool. We also developed an online version of SAFe-PC, that can be incrementally retrained with new samples. Its detection performance improves with time as new samples are collected, while the time to retrain the classifier stays constant.
UR - http://www.scopus.com/inward/record.url?scp=85051825990&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85051825990&partnerID=8YFLogxK
U2 - 10.1109/TDSC.2018.2864993
DO - 10.1109/TDSC.2018.2864993
M3 - Article
AN - SCOPUS:85051825990
SN - 1545-5971
VL - 15
SP - 988
EP - 1001
JO - IEEE Transactions on Dependable and Secure Computing
JF - IEEE Transactions on Dependable and Secure Computing
IS - 6
M1 - 8440723
ER -