To address the increase of multimedia traffic dominated by streaming videos, user equipment (UE) can collaboratively cache and share contents to alleviate the burden of base stations. Prior work on device-To-device (D2D) caching policies assumes perfect knowledge of the content popularity distribution. Since the content popularity distribution is usually unavailable in advance, a machine learning-based caching strategy that exploits the knowledge of content demand history would be highly promising. Thus, we design D2D caching strategies using multi-Agent reinforcement learning in this paper. Specifically, we model the D2D caching problem as a multi-Agent multi-Armed bandit problem and use Q-learning to learn how to coordinate the caching decisions. The UEs can be independent learners (ILs) if they learn the Q-values of their own actions, and joint action learners (JALs) if they learn the Q-values of their own actions in conjunction with those of the other UEs. As the action space is very vast leading to high computational complexity, a modified combinatorial upper confidence bound algorithm is proposed to reduce the action space for both IL and JAL. The simulation results show that the proposed JAL-based caching scheme outperforms the IL-based caching scheme and other popular caching schemes in terms of average downloading latency and cache hit rate.
All Science Journal Classification (ASJC) codes
- Computer Science Applications
- Electrical and Electronic Engineering
- Applied Mathematics