Banded Controllers for Scalable POMDP Decision-Making

Kenneth Czuprynski, Kyle Hollins Wray

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Abstract

    This paper introduces a novel and computationally efficient policy representation, termed a banded controller, for Partially Observable Markov Decision Processes (POMDPs). The structure of a banded controller is obtained by restricting the number of successor nodes for each node in a finite state controller (FSC) policy representation; this is formally defined as the restriction of the controller's node transition matrices to the space of banded matrices. A gradient ascent based algorithm which leverages banded matrices is presented and we show that the policy structure results in a computational structure that can be exploited when performing policy evaluation. We then show that policy evaluation is asymptotically superior to a general FSC and that the degrees of freedom can be reduced while maintaining a large amount of expressivity in the policy. Specifically, we show that banded controller policy representations are equivalent to any FSC policy which is permutation similar to a banded controller. Meaning that banded controllers are computationally efficient policy representations for a class of FSC policies. Lastly, experiments are conducted which show that banded controllers outperform state-of-the-art FSC algorithms on many of the standard benchmark problems.

    Original languageEnglish (US)
    Title of host publication2023 European Control Conference, ECC 2023
    PublisherInstitute of Electrical and Electronics Engineers Inc.
    ISBN (Electronic)9783907144084
    DOIs
    StatePublished - 2023
    Event2023 European Control Conference, ECC 2023 - Bucharest, Romania
    Duration: Jun 13 2023Jun 16 2023

    Publication series

    Name2023 European Control Conference, ECC 2023

    Conference

    Conference2023 European Control Conference, ECC 2023
    Country/TerritoryRomania
    CityBucharest
    Period6/13/236/16/23

    All Science Journal Classification (ASJC) codes

    • Control and Optimization
    • Modeling and Simulation

    Fingerprint

    Dive into the research topics of 'Banded Controllers for Scalable POMDP Decision-Making'. Together they form a unique fingerprint.

    Cite this