TY - JOUR
T1 - Managing engineering systems with large state and action spaces through deep reinforcement learning
AU - Andriotis, C. P.
AU - Papakonstantinou, K. G.
N1 - Publisher Copyright:
© 2019 Elsevier Ltd
PY - 2019/11
Y1 - 2019/11
N2 - Decision-making for engineering systems management can be efficiently formulated using Markov Decision Processes (MDPs) or Partially Observable MDPs (POMDPs). Typical MDP/POMDP solution procedures utilize offline knowledge about the environment and provide detailed policies for relatively small systems with tractable state and action spaces. However, in large multi-component systems the dimensions of these spaces easily explode, as system states and actions scale exponentially with the number of components, whereas environment dynamics are difficult to be described explicitly for the entire system and may, often, only be accessible through computationally expensive numerical simulators. In this work, to address these issues, an integrated Deep Reinforcement Learning (DRL) framework is introduced. The Deep Centralized Multi-agent Actor Critic (DCMAC) is developed, an off-policy actor-critic DRL algorithm that directly probes the state/belief space of the underlying MDP/POMDP, providing efficient life-cycle policies for large multi-component systems operating in high-dimensional spaces. Apart from deep network approximators parametrizing complex functions with vast state spaces, DCMAC also adopts a factorized representation of the system actions, thus being able to designate individualized component- and subsystem-level decisions, while maintaining a centralized value function for the entire system. DCMAC compares well against Deep Q-Network and exact solutions, where applicable, and outperforms optimized baseline policies that are based, on time-based, condition-based, and periodic inspection and maintenance considerations.
AB - Decision-making for engineering systems management can be efficiently formulated using Markov Decision Processes (MDPs) or Partially Observable MDPs (POMDPs). Typical MDP/POMDP solution procedures utilize offline knowledge about the environment and provide detailed policies for relatively small systems with tractable state and action spaces. However, in large multi-component systems the dimensions of these spaces easily explode, as system states and actions scale exponentially with the number of components, whereas environment dynamics are difficult to be described explicitly for the entire system and may, often, only be accessible through computationally expensive numerical simulators. In this work, to address these issues, an integrated Deep Reinforcement Learning (DRL) framework is introduced. The Deep Centralized Multi-agent Actor Critic (DCMAC) is developed, an off-policy actor-critic DRL algorithm that directly probes the state/belief space of the underlying MDP/POMDP, providing efficient life-cycle policies for large multi-component systems operating in high-dimensional spaces. Apart from deep network approximators parametrizing complex functions with vast state spaces, DCMAC also adopts a factorized representation of the system actions, thus being able to designate individualized component- and subsystem-level decisions, while maintaining a centralized value function for the entire system. DCMAC compares well against Deep Q-Network and exact solutions, where applicable, and outperforms optimized baseline policies that are based, on time-based, condition-based, and periodic inspection and maintenance considerations.
UR - http://www.scopus.com/inward/record.url?scp=85066991320&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85066991320&partnerID=8YFLogxK
U2 - 10.1016/j.ress.2019.04.036
DO - 10.1016/j.ress.2019.04.036
M3 - Article
AN - SCOPUS:85066991320
SN - 0951-8320
VL - 191
JO - Reliability Engineering and System Safety
JF - Reliability Engineering and System Safety
M1 - 106483
ER -