TY - GEN
T1 - Scalable POMDP Decision-Making Using Circulant Controllers
AU - Wray, Kyle Hollins
AU - Czuprynski, Kenneth
N1 - Publisher Copyright:
© 2021 IEEE
PY - 2021
Y1 - 2021
N2 - This paper presents a novel policy representation for partially observable Markov decision processes (POMDPs) called circulant controllers and a provably efficient gradient-based algorithm for them. A formal mathematical description is provided that leverages circulant matrices for the controller's stochastic node transitions. This structure is particularly effective for capturing decision-making patterns found in real-world domains with repeated periodic behaviors that adapt their cycles based on observation. This includes domains such as bipedal walking over varied terrain, pick-and-place tasks in warehouses, and home healthcare monitoring and medicine delivery in household environments. A performant gradient-based algorithm is presented with a detailed theoretical analysis, formally proving the algorithm's improved performance, as well as circulant controllers' structural properties. Experiments on these domains demonstrate that the proposed controller algorithm outperforms other state-of-the-art POMDP controller algorithms. The proposed novel controller approach is demonstrated on an actual robot performing a navigation task in a real household environment.
AB - This paper presents a novel policy representation for partially observable Markov decision processes (POMDPs) called circulant controllers and a provably efficient gradient-based algorithm for them. A formal mathematical description is provided that leverages circulant matrices for the controller's stochastic node transitions. This structure is particularly effective for capturing decision-making patterns found in real-world domains with repeated periodic behaviors that adapt their cycles based on observation. This includes domains such as bipedal walking over varied terrain, pick-and-place tasks in warehouses, and home healthcare monitoring and medicine delivery in household environments. A performant gradient-based algorithm is presented with a detailed theoretical analysis, formally proving the algorithm's improved performance, as well as circulant controllers' structural properties. Experiments on these domains demonstrate that the proposed controller algorithm outperforms other state-of-the-art POMDP controller algorithms. The proposed novel controller approach is demonstrated on an actual robot performing a navigation task in a real household environment.
UR - http://www.scopus.com/inward/record.url?scp=85125489529&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85125489529&partnerID=8YFLogxK
U2 - 10.1109/ICRA48506.2021.9561478
DO - 10.1109/ICRA48506.2021.9561478
M3 - Conference contribution
AN - SCOPUS:85125489529
T3 - Proceedings - IEEE International Conference on Robotics and Automation
SP - 6831
EP - 6837
BT - 2021 IEEE International Conference on Robotics and Automation, ICRA 2021
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2021 IEEE International Conference on Robotics and Automation, ICRA 2021
Y2 - 30 May 2021 through 5 June 2021
ER -