TY - JOUR
T1 - Offline Reinforcement Learning for Wireless Network Optimization With Mixture Datasets
AU - Yang, Kun
AU - Shi, Chengshuai
AU - Shen, Cong
AU - Yang, Jing
AU - Yeh, Shu Ping
AU - Sydir, Jaroslaw J.
N1 - Publisher Copyright:
© 2024 IEEE.
PY - 2024
Y1 - 2024
N2 - The recent development of reinforcement learning (RL) has boosted the adoption of online RL for wireless radio resource management (RRM). However, online RL algorithms require direct interactions with the environment, which may be undesirable given the potential performance loss due to the unavoidable exploration in RL. In this work, we first explore the use of offline RL algorithms in solving the RRM problem. We evaluate several state-of-the-art offline RL algorithms for a practical RRM problem that aims at maximizing a linear combination of total rates and 5-percentile rates via user scheduling. Our findings indicate that the performance of offline RL for the RRM problem is heavily contingent upon the behavior policy deployed for data collection. We propose an innovative offline RL approach utilizing heterogeneous datasets from various behavior policies. This method demonstrates that a strategic mixture of datasets enables near-optimal RL policy generation, even with suboptimal behavior policies. Additionally, we introduce two enhancements: an ensemble-based policy to augment dataset mixture training efficiency, and a novel offline-to-online strategy for seamless adaptation to new environments. Our data mixture approach achieves over 95% efficiency of an online RL agent in the absence of expert data. The ensemble algorithm notably reduces training duration by half compared to the data mixture method. Furthermore, our model, when applied with offline-to-online fine-tuning, surpasses existing benchmarks by approximately 5% in our user scheduling problem.
AB - The recent development of reinforcement learning (RL) has boosted the adoption of online RL for wireless radio resource management (RRM). However, online RL algorithms require direct interactions with the environment, which may be undesirable given the potential performance loss due to the unavoidable exploration in RL. In this work, we first explore the use of offline RL algorithms in solving the RRM problem. We evaluate several state-of-the-art offline RL algorithms for a practical RRM problem that aims at maximizing a linear combination of total rates and 5-percentile rates via user scheduling. Our findings indicate that the performance of offline RL for the RRM problem is heavily contingent upon the behavior policy deployed for data collection. We propose an innovative offline RL approach utilizing heterogeneous datasets from various behavior policies. This method demonstrates that a strategic mixture of datasets enables near-optimal RL policy generation, even with suboptimal behavior policies. Additionally, we introduce two enhancements: an ensemble-based policy to augment dataset mixture training efficiency, and a novel offline-to-online strategy for seamless adaptation to new environments. Our data mixture approach achieves over 95% efficiency of an online RL agent in the absence of expert data. The ensemble algorithm notably reduces training duration by half compared to the data mixture method. Furthermore, our model, when applied with offline-to-online fine-tuning, surpasses existing benchmarks by approximately 5% in our user scheduling problem.
UR - http://www.scopus.com/inward/record.url?scp=85192772085&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85192772085&partnerID=8YFLogxK
U2 - 10.1109/TWC.2024.3395624
DO - 10.1109/TWC.2024.3395624
M3 - Article
AN - SCOPUS:85192772085
SN - 1536-1276
VL - 23
SP - 12703
EP - 12716
JO - IEEE Transactions on Wireless Communications
JF - IEEE Transactions on Wireless Communications
IS - 10
ER -