Monitoring continuous state violation in datacenters: Exploring the time dimension

Shicong Meng, Ting Wang, Ling Liu

Research output: Chapter in Book/Report/Conference proceedingConference contribution

16 Scopus citations

Abstract

Monitoring global states of an application deployed over distributed nodes becomes prevalent in today's datacenters. State monitoring requires not only correct monitoring results but also minimum communication cost for efficiency and scalability. Most existing work adopts an instantaneous state monitoring approach, which triggers state alerts whenever a constraint is violated. Such an approach, however, may cause frequent and unnecessary state alerts due to unpredictable monitored value bursts and momentary outliers that are common in large-scale Internet applications. These false alerts may further lead to expensive and problematic counter-measures. To address this issue, we introduce window-based state monitoring in this paper. Window-based state monitoring evaluates whether state violation is continuous within a time window, and thus, gains immunity to short-term value bursts and outliers. Furthermore, we find that exploring the monitoring time window at distributed nodes achieves significant communication savings over instantaneous monitoring. Based on this finding, we develop WISE, a system that efficiently performs WIndow-based StatE monitoring at datacenter-scale. WISE is highlighted with three sets of techniques. First, WISE uses distributed filtering time windows and intelligently avoids global information collecting to achieve communication efficiency, while guaranteeing monitoring correctness at the same time. Second, WISE provides a suite of performance tuning techniques to minimize communication cost based on a sophisticated cost model. Third, WISE also employs a set of novel performance optimization techniques. Extensive experiments over both real world and synthetic traces show that WISE achieves a 50%-90% reduction in communication cost compared with existing instantaneous monitoring approaches and simple alternative schemes.

Original languageEnglish (US)
Title of host publication26th IEEE International Conference on Data Engineering, ICDE 2010 - Conference Proceedings
Pages968-979
Number of pages12
DOIs
StatePublished - 2010
Event26th IEEE International Conference on Data Engineering, ICDE 2010 - Long Beach, CA, United States
Duration: Mar 1 2010Mar 6 2010

Publication series

NameProceedings - International Conference on Data Engineering
ISSN (Print)1084-4627

Other

Other26th IEEE International Conference on Data Engineering, ICDE 2010
Country/TerritoryUnited States
CityLong Beach, CA
Period3/1/103/6/10

All Science Journal Classification (ASJC) codes

  • Software
  • Signal Processing
  • Information Systems

Fingerprint

Dive into the research topics of 'Monitoring continuous state violation in datacenters: Exploring the time dimension'. Together they form a unique fingerprint.

Cite this