TY - GEN
T1 - Reuse distance based performance modeling and workload mapping
AU - Muralidhara, Sai Prashanth
AU - Kandemir, Mahmut
AU - Kislal, Orhan
PY - 2012
Y1 - 2012
N2 - Modern multicore architectures have multiple cores connected to a hierarchical cache structure resulting in heterogeneity in cache sharing across different subsets of cores. In these systems, overall throughput and efficiency depends heavily on a careful mapping of applications to available cores. In this paper, we study the problem of application-to-core mapping with the goal of trying to improve the overall cache performance in the presence of a hierarchical multi-level cache structure. We propose to sample the memory access patterns of individual applications and build their reuse distance distributions. Further, we propose to use these reuse distance distributions to compute an application-to-core mapping that tries to improve the overall cache performance, and consequently, the overall throughput. We show that our proposed mapping scheme is very effective in practice yielding throughput benefits of about 39% over the worst case mapping and about 30% over the default operating system based mapping. We believe, as larger chip multiprocessors with deeper cache hierarchies are projected to be the norm in the future, efficient mapping of applications to cores will become a vital requirement to extract the maximum possible performance from these systems.
AB - Modern multicore architectures have multiple cores connected to a hierarchical cache structure resulting in heterogeneity in cache sharing across different subsets of cores. In these systems, overall throughput and efficiency depends heavily on a careful mapping of applications to available cores. In this paper, we study the problem of application-to-core mapping with the goal of trying to improve the overall cache performance in the presence of a hierarchical multi-level cache structure. We propose to sample the memory access patterns of individual applications and build their reuse distance distributions. Further, we propose to use these reuse distance distributions to compute an application-to-core mapping that tries to improve the overall cache performance, and consequently, the overall throughput. We show that our proposed mapping scheme is very effective in practice yielding throughput benefits of about 39% over the worst case mapping and about 30% over the default operating system based mapping. We believe, as larger chip multiprocessors with deeper cache hierarchies are projected to be the norm in the future, efficient mapping of applications to cores will become a vital requirement to extract the maximum possible performance from these systems.
UR - http://www.scopus.com/inward/record.url?scp=84862690564&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84862690564&partnerID=8YFLogxK
U2 - 10.1145/2212908.2212936
DO - 10.1145/2212908.2212936
M3 - Conference contribution
AN - SCOPUS:84862690564
SN - 9781450312158
T3 - CF '12 - Proceedings of the ACM Computing Frontiers Conference
SP - 193
EP - 202
BT - CF '12 - Proceedings of the ACM Computing Frontiers Conference
T2 - ACM Computing Frontiers Conference, CF '12
Y2 - 15 May 2012 through 17 May 2012
ER -