TY - GEN
T1 - Banded Controllers for Scalable POMDP Decision-Making
AU - Czuprynski, Kenneth
AU - Wray, Kyle Hollins
N1 - Publisher Copyright:
© 2023 EUCA.
PY - 2023
Y1 - 2023
N2 - This paper introduces a novel and computationally efficient policy representation, termed a banded controller, for Partially Observable Markov Decision Processes (POMDPs). The structure of a banded controller is obtained by restricting the number of successor nodes for each node in a finite state controller (FSC) policy representation; this is formally defined as the restriction of the controller's node transition matrices to the space of banded matrices. A gradient ascent based algorithm which leverages banded matrices is presented and we show that the policy structure results in a computational structure that can be exploited when performing policy evaluation. We then show that policy evaluation is asymptotically superior to a general FSC and that the degrees of freedom can be reduced while maintaining a large amount of expressivity in the policy. Specifically, we show that banded controller policy representations are equivalent to any FSC policy which is permutation similar to a banded controller. Meaning that banded controllers are computationally efficient policy representations for a class of FSC policies. Lastly, experiments are conducted which show that banded controllers outperform state-of-the-art FSC algorithms on many of the standard benchmark problems.
AB - This paper introduces a novel and computationally efficient policy representation, termed a banded controller, for Partially Observable Markov Decision Processes (POMDPs). The structure of a banded controller is obtained by restricting the number of successor nodes for each node in a finite state controller (FSC) policy representation; this is formally defined as the restriction of the controller's node transition matrices to the space of banded matrices. A gradient ascent based algorithm which leverages banded matrices is presented and we show that the policy structure results in a computational structure that can be exploited when performing policy evaluation. We then show that policy evaluation is asymptotically superior to a general FSC and that the degrees of freedom can be reduced while maintaining a large amount of expressivity in the policy. Specifically, we show that banded controller policy representations are equivalent to any FSC policy which is permutation similar to a banded controller. Meaning that banded controllers are computationally efficient policy representations for a class of FSC policies. Lastly, experiments are conducted which show that banded controllers outperform state-of-the-art FSC algorithms on many of the standard benchmark problems.
UR - http://www.scopus.com/inward/record.url?scp=85166484405&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85166484405&partnerID=8YFLogxK
U2 - 10.23919/ECC57647.2023.10178267
DO - 10.23919/ECC57647.2023.10178267
M3 - Conference contribution
AN - SCOPUS:85166484405
T3 - 2023 European Control Conference, ECC 2023
BT - 2023 European Control Conference, ECC 2023
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2023 European Control Conference, ECC 2023
Y2 - 13 June 2023 through 16 June 2023
ER -