TY - GEN
T1 - To move or not to move?
T2 - 14th ACM International Conference on Systems and Storage, SYSTOR 2021
AU - Chang, Chia Hao
AU - Kumar, Adithya
AU - Sivasubramaniam, Anand
N1 - Publisher Copyright:
© 2021 ACM.
PY - 2021/6/14
Y1 - 2021/6/14
N2 - This paper focuses on the severe page thrashing problem that can arise when running large irregular memory access applications on limited GPU memory systems. Such memory over-subscription causes very poor performance in the currently on demand (eager) or page-group granularity access-counter based (lazy) page migration mechanisms found in NVIDIA's UVM drivers. Our detailed analysis of these executions reveals a very novel insight: rather than duplicate the responsibility of catering to both temporal and spatial locality in both GPU caches and its memory, it is better for the former to simply cater to the temporal aspect, and the latter to the spatial aspect, thereby saving precious memory system capacities. Based on this, we build an adaptive page migration scheme, called DynaMap, that (i) uses a compiler pass to instrument off-the-shelf CUDA UVM applications for spatial utilization tracking, (ii) dynamically sets a spatial utilization threshold to determine migration based on memory pressure and access characteristics, and (iii) enhances the current NVIDIA UVM driver to dynamically migrate the page (from the host memory to the GPU) based on the threshold. Using 7 irregular applications from public benchmark suites, we implement DynaMap on a real system with different over-subscription ratios to show speedups as much as 2.5X (34% on the average) over state-of-the-art UVM implementations.
AB - This paper focuses on the severe page thrashing problem that can arise when running large irregular memory access applications on limited GPU memory systems. Such memory over-subscription causes very poor performance in the currently on demand (eager) or page-group granularity access-counter based (lazy) page migration mechanisms found in NVIDIA's UVM drivers. Our detailed analysis of these executions reveals a very novel insight: rather than duplicate the responsibility of catering to both temporal and spatial locality in both GPU caches and its memory, it is better for the former to simply cater to the temporal aspect, and the latter to the spatial aspect, thereby saving precious memory system capacities. Based on this, we build an adaptive page migration scheme, called DynaMap, that (i) uses a compiler pass to instrument off-the-shelf CUDA UVM applications for spatial utilization tracking, (ii) dynamically sets a spatial utilization threshold to determine migration based on memory pressure and access characteristics, and (iii) enhances the current NVIDIA UVM driver to dynamically migrate the page (from the host memory to the GPU) based on the threshold. Using 7 irregular applications from public benchmark suites, we implement DynaMap on a real system with different over-subscription ratios to show speedups as much as 2.5X (34% on the average) over state-of-the-art UVM implementations.
UR - http://www.scopus.com/inward/record.url?scp=85108415985&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85108415985&partnerID=8YFLogxK
U2 - 10.1145/3456727.3463766
DO - 10.1145/3456727.3463766
M3 - Conference contribution
AN - SCOPUS:85108415985
T3 - SYSTOR 2021 - Proceedings of the 14th ACM International Conference on Systems and Storage
BT - SYSTOR 2021 - Proceedings of the 14th ACM International Conference on Systems and Storage
PB - Association for Computing Machinery, Inc
Y2 - 14 June 2021 through 16 June 2021
ER -