Skip to main navigation Skip to search Skip to main content

Decision Making in Non-Stationary Environments with Policy-Augmented Search

  • Ava Pettet
  • , Yunuo Zhang
  • , Baiting Luo
  • , Kyle Wray
  • , Hendrik Baier
  • , Aron Laszka
  • , Abhishek Dubey
  • , Ayan Mukhopadhyay

Research output: Contribution to journalConference articlepeer-review

Abstract

Sequential decision-making is challenging in non-stationary environments, where the environment in which an agent operates can change over time. Policies learned before execution become stale when the environment changes, and relearning takes time and computational effort. Online search, on the other hand, can return sub-optimal actions when there are limitations on allowed run-time. In this paper, we introduce Policy-Augmented Monte Carlo tree search (PA-MCTS), which combines action-value estimates from an out-of-date policy with an online search using an up-to-date model of the environment. We prove several theoretical results about PA-MCTS. We also compare and contrast our approach with AlphaZero, another hybrid planning approach, and Deep Q Learning on several OpenAI Gym environments and show that PA-MCTS outperforms these baselines.

Original languageEnglish (US)
Pages (from-to)2417-2419
Number of pages3
JournalProceedings of the International Joint Conference on Autonomous Agents and Multiagent Systems, AAMAS
Volume2024-May
StatePublished - 2024
Event23rd International Conference on Autonomous Agents and Multiagent Systems, AAMAS 2024 - Auckland, New Zealand
Duration: May 6 2024May 10 2024

All Science Journal Classification (ASJC) codes

  • Artificial Intelligence
  • Software
  • Control and Systems Engineering

Fingerprint

Dive into the research topics of 'Decision Making in Non-Stationary Environments with Policy-Augmented Search'. Together they form a unique fingerprint.

Cite this