TY - GEN
T1 - On-chip cache hierarchy-aware tile scheduling for multicore machines
AU - Liu, Jun
AU - Zhang, Yuanrui
AU - Ding, Wei
AU - Kandemir, Mahmut
PY - 2011
Y1 - 2011
N2 - Iteration space tiling and scheduling is an important technique for optimizing loops that constitute a large fraction of execution times in computation kernels of both scientific codes and embedded applications. While tiling has been studied extensively in the context of both uniprocessor and multiprocessor platforms, prior research has paid less attention to tile scheduling, especially when targeting multicore machines with deep on-chip cache hierarchies. In this paper, we propose a cache hierarchy-aware tile scheduling algorithm for multicore machines, with the purpose of maximizing both horizontal and vertical data reuses in on-chip caches, and balancing the workloads across different cores. This scheduling algorithm is one of the key components in a source-to-source translation tool that we developed for automatic loop parallelization and multithreaded code generation from sequential codes. To the best of our knowledge, this is the first effort that develops a fully-automated tile scheduling strategy customized for on-chip cache topologies of multicore machines. The experimental results collected by executing twelve application programs on three commercial Intel machines (Nehalem, Dunnington, and Harpertown) reveal that our cache-aware tile scheduling brings about 27.9% reduction in cache misses, and on average, 13.5% improvement in execution times over an alternate method tested.
AB - Iteration space tiling and scheduling is an important technique for optimizing loops that constitute a large fraction of execution times in computation kernels of both scientific codes and embedded applications. While tiling has been studied extensively in the context of both uniprocessor and multiprocessor platforms, prior research has paid less attention to tile scheduling, especially when targeting multicore machines with deep on-chip cache hierarchies. In this paper, we propose a cache hierarchy-aware tile scheduling algorithm for multicore machines, with the purpose of maximizing both horizontal and vertical data reuses in on-chip caches, and balancing the workloads across different cores. This scheduling algorithm is one of the key components in a source-to-source translation tool that we developed for automatic loop parallelization and multithreaded code generation from sequential codes. To the best of our knowledge, this is the first effort that develops a fully-automated tile scheduling strategy customized for on-chip cache topologies of multicore machines. The experimental results collected by executing twelve application programs on three commercial Intel machines (Nehalem, Dunnington, and Harpertown) reveal that our cache-aware tile scheduling brings about 27.9% reduction in cache misses, and on average, 13.5% improvement in execution times over an alternate method tested.
UR - http://www.scopus.com/inward/record.url?scp=79957454903&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=79957454903&partnerID=8YFLogxK
U2 - 10.1109/CGO.2011.5764684
DO - 10.1109/CGO.2011.5764684
M3 - Conference contribution
AN - SCOPUS:79957454903
SN - 9781612843551
T3 - Proceedings - International Symposium on Code Generation and Optimization, CGO 2011
SP - 161
EP - 170
BT - Proceedings - International Symposium on Code Generation and Optimization, CGO 2011
T2 - 9th International Symposium on Code Generation and Optimization, CGO 2011
Y2 - 2 April 2011 through 6 April 2011
ER -