Scalable Gradient Ascent for Controllers in Constrained POMDPs

Kyle Hollins Wray, Kenneth Czuprynski

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    7 Scopus citations

    Abstract

    This paper presents a novel gradient ascent al-gorithm and nonlinear programming algorithm for finite state controller policies in constrained partially observable Markov decision processes (CPOMDPs). A key component of the gradient ascent algorithm is a constraint projection to ensure constraints are satisfied. Both an optimal and an approximate projection are formally defined. A theoretical analysis of the algorithm and its projections is presented, formally proving aspects of projection correctness and algorithm convergence. Experiments evaluate the baseline and novel algorithms, as well as both constraint projections, on seven CPOMDP benchmark domains. The proposed novel algorithm is demonstrated on an actual robot performing a navigation task in a real household environment.

    Original languageEnglish (US)
    Title of host publication2022 IEEE International Conference on Robotics and Automation, ICRA 2022
    PublisherInstitute of Electrical and Electronics Engineers Inc.
    Pages9085-9091
    Number of pages7
    ISBN (Electronic)9781728196817
    DOIs
    StatePublished - 2022
    Event39th IEEE International Conference on Robotics and Automation, ICRA 2022 - Philadelphia, United States
    Duration: May 23 2022May 27 2022

    Publication series

    NameProceedings - IEEE International Conference on Robotics and Automation
    ISSN (Print)1050-4729

    Conference

    Conference39th IEEE International Conference on Robotics and Automation, ICRA 2022
    Country/TerritoryUnited States
    CityPhiladelphia
    Period5/23/225/27/22

    All Science Journal Classification (ASJC) codes

    • Software
    • Control and Systems Engineering
    • Artificial Intelligence
    • Electrical and Electronic Engineering

    Fingerprint

    Dive into the research topics of 'Scalable Gradient Ascent for Controllers in Constrained POMDPs'. Together they form a unique fingerprint.

    Cite this