Abstract
Sequential decision-making is challenging in non-stationary environments, where the environment in which an agent operates can change over time. Policies learned before execution become stale when the environment changes, and relearning takes time and computational effort. Online search, on the other hand, can return sub-optimal actions when there are limitations on allowed run-time. In this paper, we introduce Policy-Augmented Monte Carlo tree search (PA-MCTS), which combines action-value estimates from an out-of-date policy with an online search using an up-to-date model of the environment. We prove several theoretical results about PA-MCTS. We also compare and contrast our approach with AlphaZero, another hybrid planning approach, and Deep Q Learning on several OpenAI Gym environments and show that PA-MCTS outperforms these baselines.
| Original language | English (US) |
|---|---|
| Pages (from-to) | 2417-2419 |
| Number of pages | 3 |
| Journal | Proceedings of the International Joint Conference on Autonomous Agents and Multiagent Systems, AAMAS |
| Volume | 2024-May |
| State | Published - 2024 |
| Event | 23rd International Conference on Autonomous Agents and Multiagent Systems, AAMAS 2024 - Auckland, New Zealand Duration: May 6 2024 → May 10 2024 |
All Science Journal Classification (ASJC) codes
- Artificial Intelligence
- Software
- Control and Systems Engineering
Fingerprint
Dive into the research topics of 'Decision Making in Non-Stationary Environments with Policy-Augmented Search'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver