TY - GEN
T1 - Cache topology aware computation mapping for multicores
AU - Kandemir, Mahmut
AU - Yemliha, Taylan
AU - Muralidhara, Saiprashanth
AU - Srikantaiah, Shekhar
AU - Irwin, Mary Jane
AU - Zhang, Yuanrui
PY - 2010
Y1 - 2010
N2 - The main contribution of this paper is a compiler based, cache topology aware code optimization scheme for emerging multicore systems. This scheme distributes the iterations of a loop to be executed in parallel across the cores of a target multicore machine and schedules the iterations assigned to each core. Our goal is to improve the utilization of the on-chip multi-layer cache hierarchy and to maximize overall application performance. We evaluate our cache topology aware approach using a set of twelve applications and three different commercial multicore machines. In addition, to study some of our experimental parameters in detail and to explore future multicore machines (with higher core counts and deeper on-chip cache hierarchies), we also conduct a simulation based study. The results collected from our experiments with three Intel multicore machines show that the proposed compiler-based approach is very effective in enhancing performance. In addition, our simulation results indicate that optimizing for the on-chip cache hierarchy will be even more important in future multicores with increasing numbers of cores and cache levels.
AB - The main contribution of this paper is a compiler based, cache topology aware code optimization scheme for emerging multicore systems. This scheme distributes the iterations of a loop to be executed in parallel across the cores of a target multicore machine and schedules the iterations assigned to each core. Our goal is to improve the utilization of the on-chip multi-layer cache hierarchy and to maximize overall application performance. We evaluate our cache topology aware approach using a set of twelve applications and three different commercial multicore machines. In addition, to study some of our experimental parameters in detail and to explore future multicore machines (with higher core counts and deeper on-chip cache hierarchies), we also conduct a simulation based study. The results collected from our experiments with three Intel multicore machines show that the proposed compiler-based approach is very effective in enhancing performance. In addition, our simulation results indicate that optimizing for the on-chip cache hierarchy will be even more important in future multicores with increasing numbers of cores and cache levels.
UR - http://www.scopus.com/inward/record.url?scp=77954696758&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=77954696758&partnerID=8YFLogxK
U2 - 10.1145/1806596.1806605
DO - 10.1145/1806596.1806605
M3 - Conference contribution
AN - SCOPUS:77954696758
SN - 9781450300193
T3 - Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI)
SP - 74
EP - 85
BT - PLDI'10 - Proceedings of the 2010 ACM SIGPLAN Conference on Programming Language Design and Implementation
T2 - ACM SIGPLAN 2010 Conference on Programming Language Design and Implementation, PLDI 2010
Y2 - 5 June 2010 through 10 June 2010
ER -