TY - GEN
T1 - Enhancing Address translations in throughput processors via compression
AU - Tang, Xulong
AU - Zhang, Ziyu
AU - Xu, Weizheng
AU - Kandemir, Mahmut Taylan
AU - Melhem, Rami
AU - Yang, Jun
N1 - Funding Information:
The authors thank PACT reviewers and shepherd for their constructive feedback and suggestion for improving this paper. The authors also thank John Morgan Sampson and Jagadish Kotra for their involvement in early discussions relevant to this work. This material is based upon work supported by the National Science Foundation under grants #1763681, #1629129, #1931531, #1629915, and is supported by a startup grant from the University of Pittsburgh.
Publisher Copyright:
© 2020 Association for Computing Machinery.
PY - 2020/9/30
Y1 - 2020/9/30
N2 - Efficient memory sharing among multiple compute engines playsan important role in shaping the overall application performanceon CPU-GPU heterogeneous platforms. Unified Virtual Memory(UVM) is a promising feature that allows globally-visible data structures and pointers such that the GPU can access the physical memory space on the CPU side, and take advantage of the host OS pagingmechanism without explicit programmer effort. However, a keyrequirement for the guaranteed performance is effective hardwaresupport of address translation. Particularly, we observe that GPU execution suffers from high TLB miss rates in a UVM environment, especially for irregular and/or memory-intensive applications. In thispaper, we propose simple yet effective compression mechanismsfor address translations to improve GPU TLB hit rates. Specifically,we explore and leverage the TLB compressibility during the execution of GPU applications to design efficient address translationcompression with minimal runtime overhead. Experimental resultsacross 22 applications indicate that our proposed approach significantly improves GPU TLB hit rates, which translate to 12% averageperformance improvement. Particularly, for 16 irregular and/ormemory-intensive applications, the performance improvementsachieved reach up to 69.2%, with an average of 16.3%.
AB - Efficient memory sharing among multiple compute engines playsan important role in shaping the overall application performanceon CPU-GPU heterogeneous platforms. Unified Virtual Memory(UVM) is a promising feature that allows globally-visible data structures and pointers such that the GPU can access the physical memory space on the CPU side, and take advantage of the host OS pagingmechanism without explicit programmer effort. However, a keyrequirement for the guaranteed performance is effective hardwaresupport of address translation. Particularly, we observe that GPU execution suffers from high TLB miss rates in a UVM environment, especially for irregular and/or memory-intensive applications. In thispaper, we propose simple yet effective compression mechanismsfor address translations to improve GPU TLB hit rates. Specifically,we explore and leverage the TLB compressibility during the execution of GPU applications to design efficient address translationcompression with minimal runtime overhead. Experimental resultsacross 22 applications indicate that our proposed approach significantly improves GPU TLB hit rates, which translate to 12% averageperformance improvement. Particularly, for 16 irregular and/ormemory-intensive applications, the performance improvementsachieved reach up to 69.2%, with an average of 16.3%.
UR - http://www.scopus.com/inward/record.url?scp=85094199867&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85094199867&partnerID=8YFLogxK
U2 - 10.1145/3410463.3414633
DO - 10.1145/3410463.3414633
M3 - Conference contribution
AN - SCOPUS:85094199867
T3 - Parallel Architectures and Compilation Techniques - Conference Proceedings, PACT
SP - 191
EP - 204
BT - PACT 2020 - Proceedings of the ACM International Conference on Parallel Architectures and Compilation Techniques
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2020 ACM International Conference on Parallel Architectures and Compilation Techniques, PACT 2020
Y2 - 3 October 2020 through 7 October 2020
ER -