TY - JOUR
T1 - NVMMU
T2 - 24th International Conference on Parallel Architecture and Compilation, PACT 2015
AU - Zhang, Jie
AU - Donofrio, David
AU - Shalf, John
AU - Kandemir, Mahmut T.
AU - Jung, Myoungsoo
N1 - Funding Information:
This research is supported by the MSIP (Ministry of Science, ICT and Future Planning), Korea, under the "IT Consilience Creative Program" (IITP-2015-R0346-15-1008) supervised by the NIPA (National IT Industry Promotion Agency). This work is also supported in part by DOE grant DE-AC02-05CH1123, NSF grants 1213052, 1205618, 1302557, 1526750, 1409095, and 1439021.
Publisher Copyright:
© 2015 IEEE.
PY - 2015
Y1 - 2015
N2 - Thanks to massive parallelism in modern Graphics Processing Units (GPUs), emerging data processing applications in GPU computing exhibit ten-fold speedups compared to CPU-only systems. However, this GPU-based acceleration is limited in many cases by the significant data movement overheads and inefficient memory management for host-side storage accesses. To address these shortcomings, this paper proposes a non-volatile memory management unit (NVMMU) that reduces the file datamovement overheads by directly connecting the Solid State Disk (SSD) to the GPU. We implemented our proposed NVMMU on a real hardware with commercially available GPU and SSD devices by considering different types of storage interfaces and configurations. In this work, NVMMU unifies two discrete software stacks (one for the SSD and other for the GPU) in two major ways. While a new interface provided by our NVMMU directly forwards file data between the GPU runtime library and the I/O runtime library, it supports non-volatile direct memory access (NDMA) that pairs those GPU and SSD devices via physically shared system memory blocks. This unification in turn can eliminate unnecessary user/kernel-mode switching, improve memory management, and remove data copy overheads. Our evaluation results demonstrate that NVMMU can reduce the overheads of file data movement by 95% on average, improving overall system performance by 78% compared to a conventional IOMMU approach.
AB - Thanks to massive parallelism in modern Graphics Processing Units (GPUs), emerging data processing applications in GPU computing exhibit ten-fold speedups compared to CPU-only systems. However, this GPU-based acceleration is limited in many cases by the significant data movement overheads and inefficient memory management for host-side storage accesses. To address these shortcomings, this paper proposes a non-volatile memory management unit (NVMMU) that reduces the file datamovement overheads by directly connecting the Solid State Disk (SSD) to the GPU. We implemented our proposed NVMMU on a real hardware with commercially available GPU and SSD devices by considering different types of storage interfaces and configurations. In this work, NVMMU unifies two discrete software stacks (one for the SSD and other for the GPU) in two major ways. While a new interface provided by our NVMMU directly forwards file data between the GPU runtime library and the I/O runtime library, it supports non-volatile direct memory access (NDMA) that pairs those GPU and SSD devices via physically shared system memory blocks. This unification in turn can eliminate unnecessary user/kernel-mode switching, improve memory management, and remove data copy overheads. Our evaluation results demonstrate that NVMMU can reduce the overheads of file data movement by 95% on average, improving overall system performance by 78% compared to a conventional IOMMU approach.
UR - http://www.scopus.com/inward/record.url?scp=84975469963&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84975469963&partnerID=8YFLogxK
U2 - 10.1109/PACT.2015.43
DO - 10.1109/PACT.2015.43
M3 - Conference article
AN - SCOPUS:84975469963
SN - 1089-795X
SP - 13
EP - 24
JO - Parallel Architectures and Compilation Techniques - Conference Proceedings, PACT
JF - Parallel Architectures and Compilation Techniques - Conference Proceedings, PACT
M1 - 7429291
Y2 - 18 October 2015 through 21 October 2015
ER -