TY - JOUR
T1 - A novel policy-graph approach with natural language and counterfactual abstractions for explaining reinforcement learning agents
AU - Liu, Tongtong
AU - McCalmon, Joe
AU - Le, Thai
AU - Rahman, Md Asifur
AU - Lee, Dongwon
AU - Alqahtani, Sarra
N1 - Publisher Copyright:
© 2023, Springer Science+Business Media, LLC, part of Springer Nature.
PY - 2023/10
Y1 - 2023/10
N2 - As reinforcement learning (RL) continues to improve and be applied in situations alongside humans, the need to explain the learned behaviors of RL agents to end-users becomes more important. Strategies for explaining the reasoning behind an agent’s policy, called policy-level explanations, can lead to important insights about both the task and the agent’s behaviors. Following this line of research, in this work, we propose a novel approach, named as CAPS, that summarizes an agent’s policy in the form of a directed graph with natural language descriptions. A decision tree based clustering method is utilized to abstract the state space of the task into fewer, condensed states which makes the policy graphs more digestible to end-users. We then use the user-defined predicates to enrich the abstract states with semantic meaning. To introduce counterfactual state explanations to the policy graph, we first identify the critical states in the graph then develop a novel counterfactual explanation method based on action perturbation in those critical states. We generate explanation graphs using CAPS on 5 RL tasks, using both deterministic and stochastic policies. We also evaluate the effectiveness of CAPS on human participants who are not RL experts in two user studies. When provided with our explanation graph, end-users are able to accurately interpret policies of trained RL agents 80% of the time, compared to 10% when provided with the next best baseline and 68.2 % of users demonstrated an increase in their confidence in understanding an agent’s behavior after provided with the counterfactual explanations.
AB - As reinforcement learning (RL) continues to improve and be applied in situations alongside humans, the need to explain the learned behaviors of RL agents to end-users becomes more important. Strategies for explaining the reasoning behind an agent’s policy, called policy-level explanations, can lead to important insights about both the task and the agent’s behaviors. Following this line of research, in this work, we propose a novel approach, named as CAPS, that summarizes an agent’s policy in the form of a directed graph with natural language descriptions. A decision tree based clustering method is utilized to abstract the state space of the task into fewer, condensed states which makes the policy graphs more digestible to end-users. We then use the user-defined predicates to enrich the abstract states with semantic meaning. To introduce counterfactual state explanations to the policy graph, we first identify the critical states in the graph then develop a novel counterfactual explanation method based on action perturbation in those critical states. We generate explanation graphs using CAPS on 5 RL tasks, using both deterministic and stochastic policies. We also evaluate the effectiveness of CAPS on human participants who are not RL experts in two user studies. When provided with our explanation graph, end-users are able to accurately interpret policies of trained RL agents 80% of the time, compared to 10% when provided with the next best baseline and 68.2 % of users demonstrated an increase in their confidence in understanding an agent’s behavior after provided with the counterfactual explanations.
UR - http://www.scopus.com/inward/record.url?scp=85168305309&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85168305309&partnerID=8YFLogxK
U2 - 10.1007/s10458-023-09615-8
DO - 10.1007/s10458-023-09615-8
M3 - Article
AN - SCOPUS:85168305309
SN - 1387-2532
VL - 37
JO - Autonomous Agents and Multi-Agent Systems
JF - Autonomous Agents and Multi-Agent Systems
IS - 2
M1 - 34
ER -