TY - GEN
T1 - Monitoring continuous state violation in datacenters
T2 - 26th IEEE International Conference on Data Engineering, ICDE 2010
AU - Meng, Shicong
AU - Wang, Ting
AU - Liu, Ling
PY - 2010
Y1 - 2010
N2 - Monitoring global states of an application deployed over distributed nodes becomes prevalent in today's datacenters. State monitoring requires not only correct monitoring results but also minimum communication cost for efficiency and scalability. Most existing work adopts an instantaneous state monitoring approach, which triggers state alerts whenever a constraint is violated. Such an approach, however, may cause frequent and unnecessary state alerts due to unpredictable monitored value bursts and momentary outliers that are common in large-scale Internet applications. These false alerts may further lead to expensive and problematic counter-measures. To address this issue, we introduce window-based state monitoring in this paper. Window-based state monitoring evaluates whether state violation is continuous within a time window, and thus, gains immunity to short-term value bursts and outliers. Furthermore, we find that exploring the monitoring time window at distributed nodes achieves significant communication savings over instantaneous monitoring. Based on this finding, we develop WISE, a system that efficiently performs WIndow-based StatE monitoring at datacenter-scale. WISE is highlighted with three sets of techniques. First, WISE uses distributed filtering time windows and intelligently avoids global information collecting to achieve communication efficiency, while guaranteeing monitoring correctness at the same time. Second, WISE provides a suite of performance tuning techniques to minimize communication cost based on a sophisticated cost model. Third, WISE also employs a set of novel performance optimization techniques. Extensive experiments over both real world and synthetic traces show that WISE achieves a 50%-90% reduction in communication cost compared with existing instantaneous monitoring approaches and simple alternative schemes.
AB - Monitoring global states of an application deployed over distributed nodes becomes prevalent in today's datacenters. State monitoring requires not only correct monitoring results but also minimum communication cost for efficiency and scalability. Most existing work adopts an instantaneous state monitoring approach, which triggers state alerts whenever a constraint is violated. Such an approach, however, may cause frequent and unnecessary state alerts due to unpredictable monitored value bursts and momentary outliers that are common in large-scale Internet applications. These false alerts may further lead to expensive and problematic counter-measures. To address this issue, we introduce window-based state monitoring in this paper. Window-based state monitoring evaluates whether state violation is continuous within a time window, and thus, gains immunity to short-term value bursts and outliers. Furthermore, we find that exploring the monitoring time window at distributed nodes achieves significant communication savings over instantaneous monitoring. Based on this finding, we develop WISE, a system that efficiently performs WIndow-based StatE monitoring at datacenter-scale. WISE is highlighted with three sets of techniques. First, WISE uses distributed filtering time windows and intelligently avoids global information collecting to achieve communication efficiency, while guaranteeing monitoring correctness at the same time. Second, WISE provides a suite of performance tuning techniques to minimize communication cost based on a sophisticated cost model. Third, WISE also employs a set of novel performance optimization techniques. Extensive experiments over both real world and synthetic traces show that WISE achieves a 50%-90% reduction in communication cost compared with existing instantaneous monitoring approaches and simple alternative schemes.
UR - http://www.scopus.com/inward/record.url?scp=77952771893&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=77952771893&partnerID=8YFLogxK
U2 - 10.1109/ICDE.2010.5447923
DO - 10.1109/ICDE.2010.5447923
M3 - Conference contribution
AN - SCOPUS:77952771893
SN - 9781424454440
T3 - Proceedings - International Conference on Data Engineering
SP - 968
EP - 979
BT - 26th IEEE International Conference on Data Engineering, ICDE 2010 - Conference Proceedings
Y2 - 1 March 2010 through 6 March 2010
ER -