GMT: GPU Orchestrated Memory Tiering for the Big Data Era

Chia Hao Chang, Vikram Sharma Mailthody, Jihoon Han, Zaid Qureshi, Anand Sivasubramaniam, Wen Mei Hwu

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

As the demand for processing larger datasets increases, GPUs need to reach deeper into their (memory) hierarchy to directly access capacities that only storage systems (SSDs) can hold. However, the state-of-the-art mechanisms to reach storage either employ software stacks running on the host CPUs as intermediaries (e.g. Dragon, HMM), which has been noted to perform poorly and not able to meet the throughput needs of GPU cores, or directly access SSDs through NVMe queues (BaM) which does not benefit from lower latencies that may be possible by having the host memory as an intermediate tier. This paper presents the design and implementation of GPU Memory Tiering (GMT) by implementing a GPU-orchestrated 3-tier hierarchy comprising GPU memory, host memory and SSDs, where the GPU orchestrates most of the transfers that are bandwidth/latency sensitive. Additionally, it is important to not blindly transfer pages from the GPU memory to host memory upon an eviction, and GMT employs a reuse-prediction based practical insertion policy to perform discretionary page placement/bypass. An implementation and evaluation on an actual platform demonstrates that GMT performs 50% better than the state-of-the-art 2-tier strategy (BaM) and over 350% better than the state-of-the-art 3-tier strategy that is orchestrated by host CPUs (HMM), over a number of GPU applications with diverse memory access characteristics.

Original languageEnglish (US)
Title of host publicationFall Cycle
PublisherAssociation for Computing Machinery
Pages464-478
Number of pages15
ISBN (Electronic)9798400703867
DOIs
StatePublished - Apr 27 2024
Event29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS 2024 - San Diego, United States
Duration: Apr 27 2024May 1 2024

Publication series

NameInternational Conference on Architectural Support for Programming Languages and Operating Systems - ASPLOS
Volume3

Conference

Conference29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS 2024
Country/TerritoryUnited States
CitySan Diego
Period4/27/245/1/24

All Science Journal Classification (ASJC) codes

  • Software
  • Information Systems
  • Hardware and Architecture

Cite this