Offline Reinforcement Learning for Wireless Network Optimization With Mixture Datasets

Kun Yang, Chengshuai Shi, Cong Shen, Jing Yang, Shu Ping Yeh, Jaroslaw J. Sydir

Research output: Contribution to journalArticlepeer-review

5 Scopus citations

Abstract

The recent development of reinforcement learning (RL) has boosted the adoption of online RL for wireless radio resource management (RRM). However, online RL algorithms require direct interactions with the environment, which may be undesirable given the potential performance loss due to the unavoidable exploration in RL. In this work, we first explore the use of offline RL algorithms in solving the RRM problem. We evaluate several state-of-the-art offline RL algorithms for a practical RRM problem that aims at maximizing a linear combination of total rates and 5-percentile rates via user scheduling. Our findings indicate that the performance of offline RL for the RRM problem is heavily contingent upon the behavior policy deployed for data collection. We propose an innovative offline RL approach utilizing heterogeneous datasets from various behavior policies. This method demonstrates that a strategic mixture of datasets enables near-optimal RL policy generation, even with suboptimal behavior policies. Additionally, we introduce two enhancements: an ensemble-based policy to augment dataset mixture training efficiency, and a novel offline-to-online strategy for seamless adaptation to new environments. Our data mixture approach achieves over 95% efficiency of an online RL agent in the absence of expert data. The ensemble algorithm notably reduces training duration by half compared to the data mixture method. Furthermore, our model, when applied with offline-to-online fine-tuning, surpasses existing benchmarks by approximately 5% in our user scheduling problem.

Original languageEnglish (US)
Pages (from-to)12703-12716
Number of pages14
JournalIEEE Transactions on Wireless Communications
Volume23
Issue number10
DOIs
StatePublished - 2024

All Science Journal Classification (ASJC) codes

  • Computer Science Applications
  • Electrical and Electronic Engineering
  • Applied Mathematics

Fingerprint

Dive into the research topics of 'Offline Reinforcement Learning for Wireless Network Optimization With Mixture Datasets'. Together they form a unique fingerprint.

Cite this