TY - GEN
T1 - Exploring the future of out-of-core computing with compute-local non-volatile memory
AU - Jung, Myoungsoo
AU - Wilson, Ellis H.
AU - Choi, Wonil
AU - Shalf, John
AU - Aktulga, Hasan Metin
AU - Yang, Chao
AU - Saule, Erik
AU - Catalyurek, Umit V.
AU - Kandemir, Mahmut
PY - 2013
Y1 - 2013
N2 - Drawing parallels to the rise of general purpose graphical processing units (GPGPUs) as accelerators for specific high performance computing (HPC) workloads, there is a rise in the use of non-volatile memory (NVM) as accelerators for I/O-intensive scientific applications. However, existing works have explored use of NVM within dedicated I/O nodes, which are distant from the compute nodes that actually need such acceleration. As NVM bandwidth begins to out-pace point-to-point network capacity, we argue for the need to break from the archetype of completely separated storage. Therefore, in this work we investigate co-location of NVM and compute by varying I/O interfaces, file systems, types of NVM, and both current and future SSD architectures, uncovering numerous bottlenecks implicit in these various levels in the I/O stack. We present novel hardware and software solutions, including the new Unified File System (UFS), to enable fuller utilization of the new compute-local NVM storage. Our experimental evaluation, which employs a real-world Out-of-Core (OoC) HPC application, demonstrates throughput increases in excess of an order of magnitude over current approaches.
AB - Drawing parallels to the rise of general purpose graphical processing units (GPGPUs) as accelerators for specific high performance computing (HPC) workloads, there is a rise in the use of non-volatile memory (NVM) as accelerators for I/O-intensive scientific applications. However, existing works have explored use of NVM within dedicated I/O nodes, which are distant from the compute nodes that actually need such acceleration. As NVM bandwidth begins to out-pace point-to-point network capacity, we argue for the need to break from the archetype of completely separated storage. Therefore, in this work we investigate co-location of NVM and compute by varying I/O interfaces, file systems, types of NVM, and both current and future SSD architectures, uncovering numerous bottlenecks implicit in these various levels in the I/O stack. We present novel hardware and software solutions, including the new Unified File System (UFS), to enable fuller utilization of the new compute-local NVM storage. Our experimental evaluation, which employs a real-world Out-of-Core (OoC) HPC application, demonstrates throughput increases in excess of an order of magnitude over current approaches.
UR - http://www.scopus.com/inward/record.url?scp=84899675402&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84899675402&partnerID=8YFLogxK
U2 - 10.1145/2503210.2503261
DO - 10.1145/2503210.2503261
M3 - Conference contribution
AN - SCOPUS:84899675402
SN - 9781450323789
T3 - International Conference for High Performance Computing, Networking, Storage and Analysis, SC
BT - Proceedings of SC 2013
PB - IEEE Computer Society
T2 - 2013 International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2013
Y2 - 17 November 2013 through 22 November 2013
ER -