Learning from the ones that got away: Detecting new forms of phishing attacks

Christopher N. Gutierrez, Taegyu Kim, Raffaele Della Corte, Jeffrey Avery, Dan Goldwasser, Marcello Cinque, Saurabh Bagchi

Research output: Contribution to journalArticlepeer-review

53 Scopus citations


Phishing attacks continue to pose a major threat for computer system defenders, often forming the first step in a multi-stage attack. There have been great strides made in phishing detection; however, some phishing emails appear to pass through filters by making simple structural and semantic changes to the messages. We tackle this problem through the use of a machine learning classifier operating on a large corpus of phishing and legitimate emails. We design SAFe-PC (Semi-Automated Feature generation for Phish Classification), a system to extract features, elevating some to higher level features, that are meant to defeat common phishing email detection strategies. To evaluate SAFe-PC , we collect a large corpus of phishing emails from the central IT organization at a tier-1 university. The execution of SAFe-PC on the dataset exposes hitherto unknown insights on phishing campaigns directed at university users. SAFe-PC detects more than 70 percent of the emails that had eluded our production deployment of Sophos, a state-of-the-art email filtering tool. It also outperforms SpamAssassin, a commonly used email filtering tool. We also developed an online version of SAFe-PC, that can be incrementally retrained with new samples. Its detection performance improves with time as new samples are collected, while the time to retrain the classifier stays constant.

Original languageEnglish (US)
Article number8440723
Pages (from-to)988-1001
Number of pages14
JournalIEEE Transactions on Dependable and Secure Computing
Issue number6
StatePublished - Nov 1 2018

All Science Journal Classification (ASJC) codes

  • Computer Science(all)
  • Electrical and Electronic Engineering


Dive into the research topics of 'Learning from the ones that got away: Detecting new forms of phishing attacks'. Together they form a unique fingerprint.

Cite this