Selectively retrofitting monitoring in distributed systems

Animashree Anandkumar, Chatschik Bisdikian, Ting He, Dakshi Agrawal

Research output: Contribution to journalConference articlepeer-review

1 Scopus citations


Current distributed systems carry legacy subsystems lacking sufficient instrumentation for monitoring the end-to-end business transactions supported by these systems. In the absence of instrumentation, only probabilistic monitoring is possible by using time-stamped log-records. Retrofitting these systems with expensive monitoring instrumentation provides high-granularity, precise tracking of transactions. Given a limited budget, local instrumentation strategies which maximize the effectiveness of monitoring transactions throughout the system are proposed. The operation of the end-to-end system is modeled by a queuing network; each queue represents a subsystem which produces time-stamped log-records as transactions pass through it. Two simple heuristics for instrumentation are proposed which become optimal under certain conditions. One heuristic selects states in the transition diagram for local instrumentation in the decreasing order of the load factors of their queues. Sufficient conditions for this load-factor heuristic to be optimal are proven using the notion of stochastic order. The other heuristic selects states in the transition diagram based on the approximated tracking accuracy of probabilistic monitoring at each state, which is shown to be tight at low arrival rates.

Original languageEnglish (US)
Pages (from-to)6-8
Number of pages3
JournalPerformance Evaluation Review
Issue number2
StatePublished - Oct 16 2009
Event11th Workshop on MAthematical Performance Modeling and Analysis, MAMA 2009, Held in Conjunction with the ACM SIGMETRICS/Performance 2009 Conference - Seattle, WA, United States
Duration: Jun 15 2009Jun 15 2009

All Science Journal Classification (ASJC) codes

  • Software
  • Hardware and Architecture
  • Computer Networks and Communications


Dive into the research topics of 'Selectively retrofitting monitoring in distributed systems'. Together they form a unique fingerprint.

Cite this