TY - JOUR
T1 - Efficient mining of the multidimensional traffic cluster hierarchy for digesting, visualization, and anomaly identification
AU - Wang, Jisheng
AU - Miller, David J.
AU - Kesidis, George
N1 - Funding Information:
Manuscript received September 15, 2005; revised March 31, 2006. This work was supported in part by DHS/NSF under Grant 0335241 (EMIST/DETER). The authors are with The Pennsylvania State University, University Park, PA 16802 USA (e-mail: [email protected]; [email protected]; kesidis@engr. psu.edu). Digital Object Identifier 10.1109/JSAC.2006.877216
Funding Information:
Dr. Miller received the National Science Foundation CAREER Award in 1996. He is an Associate Editor for the IEEE TRANSACTIONS ON SIGNAL PROCESSING.
PY - 2006/10
Y1 - 2006/10
N2 - Mining traffic to identify the dominant flows sent over a given link, over a specified time interval, is a valuable capability with applications to traffic auditing, simulation, visualization, as well as anomaly detection. Recently, Estan et al. advanced a comprehensive data mining structure tailored for networking data - a parsimonious, multidimensional flow hierarchy, along with an algorithm for its construction. While they primarily targeted offline auditing, use in interactive traffic visualization and anomaly/attack detection will require real-time data mining. We suggest several improvements to Estan et al.'s algorithm that substantially reduce the computational complexity of multidimensional flow mining. We also propose computational and memory-efficient approaches for unidimensional clustering of the IP address spaces. For baseline implementations, evaluated on the New Zealand (NZIX) trace data, our method reduced CPU execution times of the Estan et al. method by a factor of more than eight. We also develop a methodology for anomaly/attack detection based on flow mining, demonstrating the usefulness of this approach on traces from the Slammer and Code Red worms and the MIT Lincoln Laboratories DDoS data.
AB - Mining traffic to identify the dominant flows sent over a given link, over a specified time interval, is a valuable capability with applications to traffic auditing, simulation, visualization, as well as anomaly detection. Recently, Estan et al. advanced a comprehensive data mining structure tailored for networking data - a parsimonious, multidimensional flow hierarchy, along with an algorithm for its construction. While they primarily targeted offline auditing, use in interactive traffic visualization and anomaly/attack detection will require real-time data mining. We suggest several improvements to Estan et al.'s algorithm that substantially reduce the computational complexity of multidimensional flow mining. We also propose computational and memory-efficient approaches for unidimensional clustering of the IP address spaces. For baseline implementations, evaluated on the New Zealand (NZIX) trace data, our method reduced CPU execution times of the Estan et al. method by a factor of more than eight. We also develop a methodology for anomaly/attack detection based on flow mining, demonstrating the usefulness of this approach on traces from the Slammer and Code Red worms and the MIT Lincoln Laboratories DDoS data.
UR - http://www.scopus.com/inward/record.url?scp=33749818129&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=33749818129&partnerID=8YFLogxK
U2 - 10.1109/JSAC.2006.877216
DO - 10.1109/JSAC.2006.877216
M3 - Article
AN - SCOPUS:33749818129
SN - 0733-8716
VL - 24
SP - 1929
EP - 1941
JO - IEEE Journal on Selected Areas in Communications
JF - IEEE Journal on Selected Areas in Communications
IS - 10
M1 - 1705623
ER -