All-time safety and sample-efficient meta update for online safe meta reinforcement learning under Markov task transition

Research output: Contribution to journalArticlepeer-review

Abstract

This paper studies the issues of ensuring all-time safety and sample-efficient meta update in online safe meta reinforcement learning (MRL) on physical agents (e.g., mobile robots). We propose a novel masked Follow-the-Last-Parameter-Policy (FTLPP) framework, which is composed of a policy masking framework and a sample-efficient online meta update method. The policy masking framework applies a masking function over the learned control policy and ensures all-time safety by suppressing the probability of executing unsafe actions to a sufficiently small value. To enhance sample efficiency, the problem of online update of the meta parameter is transformed into a policy optimization problem, where the tasks are the states and the meta parameters for the next task are the actions, and then is solved using an off-policy reinforcement learning algorithm. We evaluate our method on Frozen Lake, Acrobot, Half Cheetah and Hopper from OpenAI gym and compare it with baseline methods Meta SRL and the variants of FTML and SAILR.

Original languageEnglish (US)
Article number173
JournalMachine Learning
Volume114
Issue number8
DOIs
StatePublished - Aug 2025

All Science Journal Classification (ASJC) codes

  • Software
  • Artificial Intelligence

Fingerprint

Dive into the research topics of 'All-time safety and sample-efficient meta update for online safe meta reinforcement learning under Markov task transition'. Together they form a unique fingerprint.

Cite this