TY - JOUR
T1 - Exploring the future of out-of-core computing with compute-local non-volatile memory
AU - Jung, Myoungsoo
AU - Wilson, Ellis H.
AU - Choi, Wonil
AU - Shalf, John
AU - Aktulga, Hasan Metin
AU - Yang, Chao
AU - Saule, Erik
AU - Catalyurek, Umit V.
AU - Kandemir, Mahmut
PY - 2014
Y1 - 2014
N2 - Drawing parallels to the rise of general purpose graphical processing units (GPGPUs) as accelerators for specific high-performance computing (HPC) workloads, there is a rise in the use of non-volatile memory (NVM) as accelerators for I/O-intensive scientific applications. However, existing works have explored use of NVM within dedicated I/O nodes, which are distant from the compute nodes that actually need such acceleration. As NVM bandwidth begins to out-pace point-to-point network capacity, we argue for the need to break from the archetype of completely separated storage. Therefore, in this work we investigate co-location of NVM and compute by varying I/O interfaces, file systems, types of NVM, and both current and future SSD architectures, uncovering numerous bottlenecks implicit in these various levels in the I/O stack. We present novel hardware and software solutions, including the new Unified File System (UFS), to enable fuller utilization of the new compute-local NVM storage. Our experimental evaluation, which employs a real-world Out-of-Core (OoC) HPC application, demonstrates throughput increases in excess of an order of magnitude over current approaches.
AB - Drawing parallels to the rise of general purpose graphical processing units (GPGPUs) as accelerators for specific high-performance computing (HPC) workloads, there is a rise in the use of non-volatile memory (NVM) as accelerators for I/O-intensive scientific applications. However, existing works have explored use of NVM within dedicated I/O nodes, which are distant from the compute nodes that actually need such acceleration. As NVM bandwidth begins to out-pace point-to-point network capacity, we argue for the need to break from the archetype of completely separated storage. Therefore, in this work we investigate co-location of NVM and compute by varying I/O interfaces, file systems, types of NVM, and both current and future SSD architectures, uncovering numerous bottlenecks implicit in these various levels in the I/O stack. We present novel hardware and software solutions, including the new Unified File System (UFS), to enable fuller utilization of the new compute-local NVM storage. Our experimental evaluation, which employs a real-world Out-of-Core (OoC) HPC application, demonstrates throughput increases in excess of an order of magnitude over current approaches.
UR - http://www.scopus.com/inward/record.url?scp=84901783566&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84901783566&partnerID=8YFLogxK
U2 - 10.3233/SPR-140384
DO - 10.3233/SPR-140384
M3 - Article
AN - SCOPUS:84901783566
SN - 1058-9244
VL - 22
SP - 125
EP - 139
JO - Scientific Programming
JF - Scientific Programming
IS - 2
ER -