TY - GEN
T1 - Adaptive algorithms for diagnosing large-scale failures in computer networks
AU - Tati, Srikar
AU - Ko, Bong Jun
AU - Cao, Guohong
AU - Swami, Ananthram
AU - La Porta, Thomas
N1 - Copyright:
Copyright 2012 Elsevier B.V., All rights reserved.
PY - 2012
Y1 - 2012
N2 - In this paper, we propose an algorithm to efficiently diagnose large-scale clustered failures. The algorithm, Cluster-MAX-COVERAGE (CMC), is based on greedy approach. We address the challenge of determining faults with incomplete symptoms. CMC makes novel use of both positive and negative symptoms to output a hypothesis list with a low number of false negatives and false positives quickly. CMC requires reports from about half as many nodes as other existing algorithms to determine failures with 100% accuracy. Moreover, CMC accomplishes this gain significantly faster (sometimes by two orders of magnitude) than an algorithm that matches its accuracy. Furthermore, we propose an adaptive algorithm called Adaptive-MAX-COVERAGE (AMC) that performs efficiently during both kinds of failures, i.e., independent and clustered. During a series of failues that include both independent and clustered, AMC results in a reduced number of false negatives and false positives.
AB - In this paper, we propose an algorithm to efficiently diagnose large-scale clustered failures. The algorithm, Cluster-MAX-COVERAGE (CMC), is based on greedy approach. We address the challenge of determining faults with incomplete symptoms. CMC makes novel use of both positive and negative symptoms to output a hypothesis list with a low number of false negatives and false positives quickly. CMC requires reports from about half as many nodes as other existing algorithms to determine failures with 100% accuracy. Moreover, CMC accomplishes this gain significantly faster (sometimes by two orders of magnitude) than an algorithm that matches its accuracy. Furthermore, we propose an adaptive algorithm called Adaptive-MAX-COVERAGE (AMC) that performs efficiently during both kinds of failures, i.e., independent and clustered. During a series of failues that include both independent and clustered, AMC results in a reduced number of false negatives and false positives.
UR - http://www.scopus.com/inward/record.url?scp=84866677413&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84866677413&partnerID=8YFLogxK
U2 - 10.1109/DSN.2012.6263917
DO - 10.1109/DSN.2012.6263917
M3 - Conference contribution
AN - SCOPUS:84866677413
SN - 9781467316248
T3 - Proceedings of the International Conference on Dependable Systems and Networks
BT - 2012 42nd Annual IEEE/IFIP International Conference on Dependable Systems and Networks, DSN 2012
T2 - 42nd Annual IEEE/IFIP International Conference on Dependable Systems and Networks, DSN 2012
Y2 - 25 June 2012 through 28 June 2012
ER -