TY - JOUR
T1 - Unsupervised, low latency anomaly detection of algorithmically generated domain names by generative probabilistic modeling
AU - Raghuram, Jayaram
AU - Miller, David J.
AU - Kesidis, George
PY - 2014/7
Y1 - 2014/7
N2 - We propose a method for detecting anomalous domain names, with focus on algorithmically generated domain names which are frequently associated with malicious activities such as fast flux service networks, particularly for bot networks (or botnets), malware, and phishing. Our method is based on learning a (null hypothesis) probability model based on a large set of domain names that have been white listed by some reliable authority. Since these names are mostly assigned by humans, they are pronounceable, and tend to have a distribution of characters, words, word lengths, and number of words that are typical of some language (mostly English), and often consist of words drawn from a known lexicon. On the other hand, in the present day scenario, algorithmically generated domain names typically have distributions that are quite different from that of human-created domain names. We propose a fully generative model for the probability distribution of benign (white listed) domain names which can be used in an anomaly detection setting for identifying putative algorithmically generated domain names. Unlike other methods, our approach can make detections without considering any additional (latency producing) information sources, often used to detect fast flux activity. Experiments on a publicly available, large data set of domain names associated with fast flux service networks show encouraging results, relative to several baseline methods, with higher detection rates and low false positive rates.
AB - We propose a method for detecting anomalous domain names, with focus on algorithmically generated domain names which are frequently associated with malicious activities such as fast flux service networks, particularly for bot networks (or botnets), malware, and phishing. Our method is based on learning a (null hypothesis) probability model based on a large set of domain names that have been white listed by some reliable authority. Since these names are mostly assigned by humans, they are pronounceable, and tend to have a distribution of characters, words, word lengths, and number of words that are typical of some language (mostly English), and often consist of words drawn from a known lexicon. On the other hand, in the present day scenario, algorithmically generated domain names typically have distributions that are quite different from that of human-created domain names. We propose a fully generative model for the probability distribution of benign (white listed) domain names which can be used in an anomaly detection setting for identifying putative algorithmically generated domain names. Unlike other methods, our approach can make detections without considering any additional (latency producing) information sources, often used to detect fast flux activity. Experiments on a publicly available, large data set of domain names associated with fast flux service networks show encouraging results, relative to several baseline methods, with higher detection rates and low false positive rates.
UR - http://www.scopus.com/inward/record.url?scp=84901696444&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84901696444&partnerID=8YFLogxK
U2 - 10.1016/j.jare.2014.01.001
DO - 10.1016/j.jare.2014.01.001
M3 - Article
AN - SCOPUS:84901696444
SN - 2090-1232
VL - 5
SP - 423
EP - 433
JO - Journal of Advanced Research
JF - Journal of Advanced Research
IS - 4
ER -