TY - JOUR
T1 - On Progressive Network Recovery from Massive Failures under Uncertainty
AU - Zad Tootaghaj, Diman
AU - Bartolini, Novella
AU - Khamfroush, Hana
AU - La Porta, Thomas
N1 - Funding Information:
Manuscript received April 5, 2018; revised August 7, 2018 and October 24, 2018; accepted November 1, 2018. Date of publication November 8, 2018; date of current version March 9, 2019. This research was sponsored by the Army Research Laboratory and was accomplished under Cooperative Agreement Number W911NF-13-2-0045 (ARL Cyber Security CRA). A partial and preliminary version appeared in Proc. IEEE IFIP Networking’17 [1]. The associate editor coordinating the review of this paper and approving it for publication was Q. Ling. (Corresponding author: Diman Zad Tootaghaj.) D. Z. Tootaghaj was with the Department of Computer Science and Engineering, Pennsylvania State University, University Park, PA 16802 USA. She is now with Networking and Mobility Group, Hewlett Packard Enterprise Labs, Palo Alto, CA 94304 USA (e-mail: diman.zad-tootaghaj@hpe.com).
Publisher Copyright:
© 2018 IEEE.
PY - 2019/3
Y1 - 2019/3
N2 - Network recovery after large-scale failures has tremendous cost implications. While numerous approaches have been proposed to restore critical services after large-scale failures, they mostly assume having full knowledge of failure location, which cannot be achieved in real failure scenarios. Making restoration decisions under uncertainty is often further complicated in a large-scale failure. This paper addresses progressive network recovery under the uncertain knowledge of damages. We formulate the problem as a mixed integer linear programming and show that it is NP-hard. We propose an iterative stochastic recovery algorithm (ISR) to recover the network in a progressive manner to satisfy the critical services. At each optimization step, we make a decision to repair a part of the network and gather more information iteratively, until critical services are completely restored. We propose three different approaches: 1) an iterative shortest path algorithm; 2) an approximate branch and bound (ISR-BB); and 3) an iterative multicommodity LP relaxation (ISR-MULT). Further, we compared our approach with the state-of-the-art centrality-based damage assessment and recovery (CeDAR) and iterative split and prune (ISP) algorithms. Our results show that ISR-BB and ISR-MULT outperform the state-of-the-art ISP and CeDAR algorithms while we can configure our choice of tradeoff between the execution time, the number of repairs (cost), and the demand loss. We show that our recovery algorithm, on average, can reduce the total number of repairs by a factor of about 3 with respect to ISP, while satisfying all critical demands.
AB - Network recovery after large-scale failures has tremendous cost implications. While numerous approaches have been proposed to restore critical services after large-scale failures, they mostly assume having full knowledge of failure location, which cannot be achieved in real failure scenarios. Making restoration decisions under uncertainty is often further complicated in a large-scale failure. This paper addresses progressive network recovery under the uncertain knowledge of damages. We formulate the problem as a mixed integer linear programming and show that it is NP-hard. We propose an iterative stochastic recovery algorithm (ISR) to recover the network in a progressive manner to satisfy the critical services. At each optimization step, we make a decision to repair a part of the network and gather more information iteratively, until critical services are completely restored. We propose three different approaches: 1) an iterative shortest path algorithm; 2) an approximate branch and bound (ISR-BB); and 3) an iterative multicommodity LP relaxation (ISR-MULT). Further, we compared our approach with the state-of-the-art centrality-based damage assessment and recovery (CeDAR) and iterative split and prune (ISP) algorithms. Our results show that ISR-BB and ISR-MULT outperform the state-of-the-art ISP and CeDAR algorithms while we can configure our choice of tradeoff between the execution time, the number of repairs (cost), and the demand loss. We show that our recovery algorithm, on average, can reduce the total number of repairs by a factor of about 3 with respect to ISP, while satisfying all critical demands.
UR - http://www.scopus.com/inward/record.url?scp=85056301281&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85056301281&partnerID=8YFLogxK
U2 - 10.1109/TNSM.2018.2880155
DO - 10.1109/TNSM.2018.2880155
M3 - Article
AN - SCOPUS:85056301281
SN - 1932-4537
VL - 16
SP - 113
EP - 126
JO - IEEE Transactions on Network and Service Management
JF - IEEE Transactions on Network and Service Management
IS - 1
M1 - 8527547
ER -