TY - JOUR
T1 - Multi-Agent Reinforcement Learning for Efficient Content Caching in Mobile D2D Networks
AU - Jiang, Wei
AU - Feng, Gang
AU - Qin, Shuang
AU - Yum, Tak Shing Peter
AU - Cao, Guohong
N1 - Funding Information:
This work was supported in part by the National Natural Science Foundation of China under Grant 61631005 and Grant 61871099 and in part by the National Science Foundation under Grant CNS-1526425 and Grant CNS-1815465.
Funding Information:
Manuscript received July 12, 2018; revised November 24, 2018; accepted January 15, 2019. Date of publication January 29, 2019; date of current version March 11, 2019. This work was supported in part by the National Natural Science Foundation of China under Grant 61631005 and Grant 61871099 and in part by the National Science Foundation under Grant CNS-1526425 and Grant CNS-1815465. The associate editor coordinating the review of this paper and approving it for publication was J. Tang. (Corresponding author: Gang Feng.) W. Jiang, G. Feng, and S. Qin are with the National Key Laboratory on Communications, University of Electronic Science and Technology of China, Chengdu 611731, China, and also with the Center for Intelligent Networking and Communications, University of Electronic Science and Technology of China, Chengdu 611731, China (e-mail: fenggang@uestc.edu.cn). T. S. P. Yum is with Zhejiang Lab, Hangzhou 310000, China.
Publisher Copyright:
© 2002-2012 IEEE.
PY - 2019/3
Y1 - 2019/3
N2 - To address the increase of multimedia traffic dominated by streaming videos, user equipment (UE) can collaboratively cache and share contents to alleviate the burden of base stations. Prior work on device-To-device (D2D) caching policies assumes perfect knowledge of the content popularity distribution. Since the content popularity distribution is usually unavailable in advance, a machine learning-based caching strategy that exploits the knowledge of content demand history would be highly promising. Thus, we design D2D caching strategies using multi-Agent reinforcement learning in this paper. Specifically, we model the D2D caching problem as a multi-Agent multi-Armed bandit problem and use Q-learning to learn how to coordinate the caching decisions. The UEs can be independent learners (ILs) if they learn the Q-values of their own actions, and joint action learners (JALs) if they learn the Q-values of their own actions in conjunction with those of the other UEs. As the action space is very vast leading to high computational complexity, a modified combinatorial upper confidence bound algorithm is proposed to reduce the action space for both IL and JAL. The simulation results show that the proposed JAL-based caching scheme outperforms the IL-based caching scheme and other popular caching schemes in terms of average downloading latency and cache hit rate.
AB - To address the increase of multimedia traffic dominated by streaming videos, user equipment (UE) can collaboratively cache and share contents to alleviate the burden of base stations. Prior work on device-To-device (D2D) caching policies assumes perfect knowledge of the content popularity distribution. Since the content popularity distribution is usually unavailable in advance, a machine learning-based caching strategy that exploits the knowledge of content demand history would be highly promising. Thus, we design D2D caching strategies using multi-Agent reinforcement learning in this paper. Specifically, we model the D2D caching problem as a multi-Agent multi-Armed bandit problem and use Q-learning to learn how to coordinate the caching decisions. The UEs can be independent learners (ILs) if they learn the Q-values of their own actions, and joint action learners (JALs) if they learn the Q-values of their own actions in conjunction with those of the other UEs. As the action space is very vast leading to high computational complexity, a modified combinatorial upper confidence bound algorithm is proposed to reduce the action space for both IL and JAL. The simulation results show that the proposed JAL-based caching scheme outperforms the IL-based caching scheme and other popular caching schemes in terms of average downloading latency and cache hit rate.
UR - http://www.scopus.com/inward/record.url?scp=85063002213&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85063002213&partnerID=8YFLogxK
U2 - 10.1109/TWC.2019.2894403
DO - 10.1109/TWC.2019.2894403
M3 - Article
AN - SCOPUS:85063002213
SN - 1536-1276
VL - 18
SP - 1610
EP - 1622
JO - IEEE Transactions on Wireless Communications
JF - IEEE Transactions on Wireless Communications
IS - 3
M1 - 8629363
ER -