TY - GEN
T1 - netCSI
T2 - 2011 30th IEEE International Symposium on Reliable Distributed Systems, SRDS 2011
AU - Tati, Srikar
AU - Rager, Scott
AU - Ko, Bong Jun
AU - Cao, Guohong
AU - Swami, Ananthram
AU - La Porta, Thomas
N1 - Copyright:
Copyright 2011 Elsevier B.V., All rights reserved.
PY - 2011
Y1 - 2011
N2 - In this paper we present a framework and a set of algorithms for determining faults in networks when large scale outages occur. The design principles of our algorithm, netCSI, are motivated by the fact that failures are geographically clustered in such cases. We address the challenge of determining faults with incomplete symptom information due to a limited number of reporting nodes in the network. netCSI consists of two parts: hypotheses generation algorithm, and ranking algorithm. When constructing the hypotheses list of potential causes, we make novel use of the positive and negative symptoms to improve the precision of the results. The ranking algorithm is based on conditional failure probability models that account for the geographic correlation of the network objects in clustered failures. We evaluate the performance of netCSI for networks with both random and realistic topologies. We compare the performance of netCSI with an existing fault diagnosis algorithm, MAX-COVERAGE, and achieve an average gain of 128% in accuracy for realistic topologies.
AB - In this paper we present a framework and a set of algorithms for determining faults in networks when large scale outages occur. The design principles of our algorithm, netCSI, are motivated by the fact that failures are geographically clustered in such cases. We address the challenge of determining faults with incomplete symptom information due to a limited number of reporting nodes in the network. netCSI consists of two parts: hypotheses generation algorithm, and ranking algorithm. When constructing the hypotheses list of potential causes, we make novel use of the positive and negative symptoms to improve the precision of the results. The ranking algorithm is based on conditional failure probability models that account for the geographic correlation of the network objects in clustered failures. We evaluate the performance of netCSI for networks with both random and realistic topologies. We compare the performance of netCSI with an existing fault diagnosis algorithm, MAX-COVERAGE, and achieve an average gain of 128% in accuracy for realistic topologies.
UR - http://www.scopus.com/inward/record.url?scp=83155184620&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=83155184620&partnerID=8YFLogxK
U2 - 10.1109/SRDS.2011.28
DO - 10.1109/SRDS.2011.28
M3 - Conference contribution
AN - SCOPUS:83155184620
SN - 9780769544502
T3 - Proceedings of the IEEE Symposium on Reliable Distributed Systems
SP - 167
EP - 176
BT - Proceedings - 2011 30th IEEE International Symposium on Reliable Distributed Systems, SRDS 2011
Y2 - 4 October 2011 through 7 October 2011
ER -