TY - GEN
T1 - Creation and Analysis of a Corpus of Scam Emails Targeting Universities
AU - Ciambrone, Grace
AU - Wilson, Shomir
N1 - Publisher Copyright:
© 2023 Owner/Author.
PY - 2023/4/30
Y1 - 2023/4/30
N2 - Email-based scams pose a threat to the personally identifiable information and financial safety of all email users. Within a university environment, the risks are potentially greater: traditional students (i.e., within an age range typical of college students) often lack the experience and knowledge of older email users. By understanding the topics, temporal trends, and other patterns of scam emails targeting universities, these institutions can be better equipped to reduce this threat by improving their filtering methods and educating their users. While anecdotal evidence suggests common topics and trends in these scams, the empirical evidence is limited. Observing that large universities are uniquely positioned to gather and share information about email scams, we built a corpus of 5,155 English language scam emails scraped from information security websites of five large universities in the United States. We use Latent Dirichlet Allocation (LDA) topic modelling to assess the landscape and trends of scam emails sent to university addresses. We examine themes chronologically and observe that topics vary over time, indicating changes in scammer strategies. For example, scams targeting students with disabilities have steadily risen in popularity since they first appeared in 2015, while password scams experienced a boom in 2016 but have lessened in recent years. To encourage further research to mitigate the threat of email scams, we release this corpus for others to study.
AB - Email-based scams pose a threat to the personally identifiable information and financial safety of all email users. Within a university environment, the risks are potentially greater: traditional students (i.e., within an age range typical of college students) often lack the experience and knowledge of older email users. By understanding the topics, temporal trends, and other patterns of scam emails targeting universities, these institutions can be better equipped to reduce this threat by improving their filtering methods and educating their users. While anecdotal evidence suggests common topics and trends in these scams, the empirical evidence is limited. Observing that large universities are uniquely positioned to gather and share information about email scams, we built a corpus of 5,155 English language scam emails scraped from information security websites of five large universities in the United States. We use Latent Dirichlet Allocation (LDA) topic modelling to assess the landscape and trends of scam emails sent to university addresses. We examine themes chronologically and observe that topics vary over time, indicating changes in scammer strategies. For example, scams targeting students with disabilities have steadily risen in popularity since they first appeared in 2015, while password scams experienced a boom in 2016 but have lessened in recent years. To encourage further research to mitigate the threat of email scams, we release this corpus for others to study.
UR - http://www.scopus.com/inward/record.url?scp=85159608384&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85159608384&partnerID=8YFLogxK
U2 - 10.1145/3543873.3587303
DO - 10.1145/3543873.3587303
M3 - Conference contribution
AN - SCOPUS:85159608384
T3 - ACM Web Conference 2023 - Companion of the World Wide Web Conference, WWW 2023
SP - 24
EP - 27
BT - ACM Web Conference 2023 - Companion of the World Wide Web Conference, WWW 2023
PB - Association for Computing Machinery, Inc
T2 - 2023 World Wide Web Conference, WWW 2023
Y2 - 30 April 2023 through 4 May 2023
ER -