Adaptive algorithms for diagnosing large-scale failures in computer networks

Srikar Tati, Bong Jun Ko, Guohong Cao, Ananthram Swami, Thomas La Porta

Research output: Chapter in Book/Report/Conference proceedingConference contribution

9 Scopus citations

Abstract

In this paper, we propose an algorithm to efficiently diagnose large-scale clustered failures. The algorithm, Cluster-MAX-COVERAGE (CMC), is based on greedy approach. We address the challenge of determining faults with incomplete symptoms. CMC makes novel use of both positive and negative symptoms to output a hypothesis list with a low number of false negatives and false positives quickly. CMC requires reports from about half as many nodes as other existing algorithms to determine failures with 100% accuracy. Moreover, CMC accomplishes this gain significantly faster (sometimes by two orders of magnitude) than an algorithm that matches its accuracy. Furthermore, we propose an adaptive algorithm called Adaptive-MAX-COVERAGE (AMC) that performs efficiently during both kinds of failures, i.e., independent and clustered. During a series of failues that include both independent and clustered, AMC results in a reduced number of false negatives and false positives.

Original languageEnglish (US)
Title of host publication2012 42nd Annual IEEE/IFIP International Conference on Dependable Systems and Networks, DSN 2012
DOIs
StatePublished - 2012
Event42nd Annual IEEE/IFIP International Conference on Dependable Systems and Networks, DSN 2012 - Boston, MA, United States
Duration: Jun 25 2012Jun 28 2012

Publication series

NameProceedings of the International Conference on Dependable Systems and Networks

Other

Other42nd Annual IEEE/IFIP International Conference on Dependable Systems and Networks, DSN 2012
Country/TerritoryUnited States
CityBoston, MA
Period6/25/126/28/12

All Science Journal Classification (ASJC) codes

  • Software
  • Hardware and Architecture
  • Computer Networks and Communications

Fingerprint

Dive into the research topics of 'Adaptive algorithms for diagnosing large-scale failures in computer networks'. Together they form a unique fingerprint.

Cite this