TY - GEN
T1 - Data movement aware computation partitioning
AU - Tang, Xulong
AU - Kislal, Orhan
AU - Kandemir, Mahmut
AU - Karakoy, Mustafa
N1 - Funding Information:
This research is supported in part by NSF grants #1205618, #1213052, #1212962, #1302225, #1302557, #1313560, #1320478, #1320531, #1409095, #1409723, #1439021, #1439057, #1526750, #1629129 and #1629915, and a grant from Intel.
Publisher Copyright:
© 2017 Association for Computing Machinery.
PY - 2017/10/14
Y1 - 2017/10/14
N2 - Data access costs dominate the execution times of most parallel applications and they are expected to be even more important in the future. To address this, recent research has focused on Near Data Processing (NDP) as a new paradigm that tries to bring computation to data, instead of bringing data to computation (which is the norm in conventional computing). This paper explores the potential of compiler support in exploiting NDP in the context of emerging manycore systems. To that end, we propose a novel compiler algorithm that partitions the computations in a given loop nest into subcomputations and schedules the resulting subcomputations on different cores with the goal of reducing the distance-to-data on the on-chip network. An important characteristic of our approach is that it exploits NDP while taking advantage of data locality. Our experiments with 12 multithreaded applications running on a stateof-the-art commercial manycore system indicate that the proposed compiler-based approach significantly reduces data movements on the on-chip network by taking advantage of NDP, and these benefits lead to an average execution time improvement of 18.4%.
AB - Data access costs dominate the execution times of most parallel applications and they are expected to be even more important in the future. To address this, recent research has focused on Near Data Processing (NDP) as a new paradigm that tries to bring computation to data, instead of bringing data to computation (which is the norm in conventional computing). This paper explores the potential of compiler support in exploiting NDP in the context of emerging manycore systems. To that end, we propose a novel compiler algorithm that partitions the computations in a given loop nest into subcomputations and schedules the resulting subcomputations on different cores with the goal of reducing the distance-to-data on the on-chip network. An important characteristic of our approach is that it exploits NDP while taking advantage of data locality. Our experiments with 12 multithreaded applications running on a stateof-the-art commercial manycore system indicate that the proposed compiler-based approach significantly reduces data movements on the on-chip network by taking advantage of NDP, and these benefits lead to an average execution time improvement of 18.4%.
UR - http://www.scopus.com/inward/record.url?scp=85034078669&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85034078669&partnerID=8YFLogxK
U2 - 10.1145/3123939.3123954
DO - 10.1145/3123939.3123954
M3 - Conference contribution
AN - SCOPUS:85034078669
T3 - Proceedings of the Annual International Symposium on Microarchitecture, MICRO
SP - 730
EP - 744
BT - MICRO 2017 - 50th Annual IEEE/ACM International Symposium on Microarchitecture Proceedings
PB - IEEE Computer Society
T2 - 50th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2017
Y2 - 14 October 2017 through 18 October 2017
ER -