TY - GEN
T1 - Anatomy of GPU memory system for multi-application execution
AU - Jog, Adwait
AU - Kayiran, Onur
AU - Kesten, Tuba
AU - Pattnaik, Ashutosh
AU - Bolotin, Evgeny
AU - Chatterjee, Niladrish
AU - Keckler, Stephen W.
AU - Kandemir, Mahmut T.
AU - Das, Chita R.
N1 - Publisher Copyright:
© 2015 ACM.
PY - 2015/10/5
Y1 - 2015/10/5
N2 - As GPUS make headway in the computing landscape span- ning mobile platforms, supercomputers, cloud and virtual desktop platforms, supporting concurrent execution of mul- Tiple applications in GPUS becomes essential for unlock- ing their full potential. However, unlike CPUs, multi- Application execution in GPUS is little explored. In this paper, we study the memory system of GPUS in a con- currently executing multi-application environment. We first present an analytical performance model for many-threaded architectures and show that the common use of misses-per- kilo-instruction (MPKI) as a proxy for performance is not accurate without considering the bandwidth usage of ap- plications. We characterize the memory interference of ap- plications and discuss the limitations of existing memory schedulers in mitigating this interference. We extend the analytical model to multiple applications and identify the key metrics to control various performance metrics. We conduct extensive simulations using an enhanced version of GPGPU-Sim targeted for concurrently executing multiple applications, and show that memory scheduling decisions based on MPKI and bandwidth information are more effec- Tive in enhancing throughput compared to the traditional FR-FCFS and the recently proposed RR FR-FCFS policies.
AB - As GPUS make headway in the computing landscape span- ning mobile platforms, supercomputers, cloud and virtual desktop platforms, supporting concurrent execution of mul- Tiple applications in GPUS becomes essential for unlock- ing their full potential. However, unlike CPUs, multi- Application execution in GPUS is little explored. In this paper, we study the memory system of GPUS in a con- currently executing multi-application environment. We first present an analytical performance model for many-threaded architectures and show that the common use of misses-per- kilo-instruction (MPKI) as a proxy for performance is not accurate without considering the bandwidth usage of ap- plications. We characterize the memory interference of ap- plications and discuss the limitations of existing memory schedulers in mitigating this interference. We extend the analytical model to multiple applications and identify the key metrics to control various performance metrics. We conduct extensive simulations using an enhanced version of GPGPU-Sim targeted for concurrently executing multiple applications, and show that memory scheduling decisions based on MPKI and bandwidth information are more effec- Tive in enhancing throughput compared to the traditional FR-FCFS and the recently proposed RR FR-FCFS policies.
UR - http://www.scopus.com/inward/record.url?scp=84959421343&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84959421343&partnerID=8YFLogxK
U2 - 10.1145/2818950.2818979
DO - 10.1145/2818950.2818979
M3 - Conference contribution
AN - SCOPUS:84959421343
T3 - ACM International Conference Proceeding Series
SP - 223
EP - 234
BT - MEMSYS 2015 - Proceedings of the 1st International Symposium on Memory Systems
PB - Association for Computing Machinery
T2 - 1st International Symposium on Memory Systems, MEMSYS 2015
Y2 - 14 August 2015 through 15 August 2015
ER -