TY - JOUR
T1 - Optimizing shared cache behavior of chip multiprocessors
AU - Kandemir, Mahmut
AU - Muralidhara, Sai Prashanth
AU - Narayanan, Sri Hari Krishna
AU - Zhang, Yuanrui
AU - Ozturk, Ozcan
N1 - Copyright:
Copyright 2012 Elsevier B.V., All rights reserved.
PY - 2009
Y1 - 2009
N2 - One of the critical problems associated with emerging chip multiprocessors (CMPs) is the management of on-chip shared cache space. Unfortunately, single processor centric data locality optimization schemes may not work well in the CMP case as data accesses from multiple cores can create conflicts in the shared cache space. The main contribution of this paper is a compiler directed code restructuring scheme for enhancing locality of shared data in CMPs. The proposed scheme targets the last level shared cache that exist in many commercial CMPs and has two components, namely, allocation, which determines the set of loop iterations assigned to each core, and scheduling, which determines the order in which the iterations assigned to a core are executed. Our scheme restructures the application code such that the different cores operate on shared data blocks at the same time, to the extent allowed by data dependencies. This helps to reduce reuse distances for the shared data and improves on-chip cache performance. We evaluated our approach using the Splash-2 and Parsec applications through both simulations and experiments on two commercial multi-core machines. Our experimental evaluation indicates that the proposed data locality optimization scheme improves inter-core conflict misses in the shared cache by 67% on average when both allocation and scheduling are used. Also, the execution time improvements we achieve (29% on average) are very close to the optimal savings that could be achieved using a hypothetical scheme.
AB - One of the critical problems associated with emerging chip multiprocessors (CMPs) is the management of on-chip shared cache space. Unfortunately, single processor centric data locality optimization schemes may not work well in the CMP case as data accesses from multiple cores can create conflicts in the shared cache space. The main contribution of this paper is a compiler directed code restructuring scheme for enhancing locality of shared data in CMPs. The proposed scheme targets the last level shared cache that exist in many commercial CMPs and has two components, namely, allocation, which determines the set of loop iterations assigned to each core, and scheduling, which determines the order in which the iterations assigned to a core are executed. Our scheme restructures the application code such that the different cores operate on shared data blocks at the same time, to the extent allowed by data dependencies. This helps to reduce reuse distances for the shared data and improves on-chip cache performance. We evaluated our approach using the Splash-2 and Parsec applications through both simulations and experiments on two commercial multi-core machines. Our experimental evaluation indicates that the proposed data locality optimization scheme improves inter-core conflict misses in the shared cache by 67% on average when both allocation and scheduling are used. Also, the execution time improvements we achieve (29% on average) are very close to the optimal savings that could be achieved using a hypothetical scheme.
UR - http://www.scopus.com/inward/record.url?scp=76749137634&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=76749137634&partnerID=8YFLogxK
U2 - 10.1145/1669112.1669176
DO - 10.1145/1669112.1669176
M3 - Conference article
AN - SCOPUS:76749137634
SN - 1072-4451
SP - 505
EP - 516
JO - Proceedings of the Annual International Symposium on Microarchitecture, MICRO
JF - Proceedings of the Annual International Symposium on Microarchitecture, MICRO
T2 - 42nd Annual IEEE/ACM International Symposium on Microarchitecture, Micro-42
Y2 - 12 December 2009 through 16 December 2009
ER -