Multi-Agent Reinforcement Learning for Efficient Content Caching in Mobile D2D Networks

Wei Jiang, Gang Feng, Shuang Qin, Tak Shing Peter Yum, Guohong Cao

Research output: Contribution to journalArticlepeer-review

145 Scopus citations

Abstract

To address the increase of multimedia traffic dominated by streaming videos, user equipment (UE) can collaboratively cache and share contents to alleviate the burden of base stations. Prior work on device-To-device (D2D) caching policies assumes perfect knowledge of the content popularity distribution. Since the content popularity distribution is usually unavailable in advance, a machine learning-based caching strategy that exploits the knowledge of content demand history would be highly promising. Thus, we design D2D caching strategies using multi-Agent reinforcement learning in this paper. Specifically, we model the D2D caching problem as a multi-Agent multi-Armed bandit problem and use Q-learning to learn how to coordinate the caching decisions. The UEs can be independent learners (ILs) if they learn the Q-values of their own actions, and joint action learners (JALs) if they learn the Q-values of their own actions in conjunction with those of the other UEs. As the action space is very vast leading to high computational complexity, a modified combinatorial upper confidence bound algorithm is proposed to reduce the action space for both IL and JAL. The simulation results show that the proposed JAL-based caching scheme outperforms the IL-based caching scheme and other popular caching schemes in terms of average downloading latency and cache hit rate.

Original languageEnglish (US)
Article number8629363
Pages (from-to)1610-1622
Number of pages13
JournalIEEE Transactions on Wireless Communications
Volume18
Issue number3
DOIs
StatePublished - Mar 2019

All Science Journal Classification (ASJC) codes

  • Computer Science Applications
  • Electrical and Electronic Engineering
  • Applied Mathematics

Fingerprint

Dive into the research topics of 'Multi-Agent Reinforcement Learning for Efficient Content Caching in Mobile D2D Networks'. Together they form a unique fingerprint.

Cite this