Phishing attacks continue to pose a major threat for computer system defenders, often forming the first step in a multi-stage attack. There have been great strides made in phishing detection; however, some phishing emails appear to pass through filters by making simple structural and semantic changes to the messages. We tackle this problem through the use of a machine learning classifier operating on a large corpus of phishing and legitimate emails. We design SAFe-PC (Semi-Automated Feature generation for Phish Classification), a system to extract features, elevating some to higher level features, that are meant to defeat common phishing email detection strategies. To evaluate SAFe-PC , we collect a large corpus of phishing emails from the central IT organization at a tier-1 university. The execution of SAFe-PC on the dataset exposes hitherto unknown insights on phishing campaigns directed at university users. SAFe-PC detects more than 70 percent of the emails that had eluded our production deployment of Sophos, a state-of-the-art email filtering tool. It also outperforms SpamAssassin, a commonly used email filtering tool. We also developed an online version of SAFe-PC, that can be incrementally retrained with new samples. Its detection performance improves with time as new samples are collected, while the time to retrain the classifier stays constant.
|Original language||English (US)|
|Number of pages||14|
|Journal||IEEE Transactions on Dependable and Secure Computing|
|State||Published - Nov 1 2018|
All Science Journal Classification (ASJC) codes
- Computer Science(all)
- Electrical and Electronic Engineering