TY - GEN
T1 - Mix and Match
T2 - 2021 ACM SIGMETRICS / International Conference on Measurement and Modeling of Computer Systems, SIGMETRICS 2021
AU - Tang, Xulong
AU - Kandemir, Mahmut Taylan
AU - Karakoy, Mustafa
N1 - Funding Information:
The authors thank Dr. Evgenia Smirni for shepherding the paper. This work is supported in part by NSF grants #1908793, #1629915, #1629129, #1763681, #2028929, #2008398, #2011146, and #1931531, as well as a startup funding from the University of Pittsburgh.
Publisher Copyright:
© 2021 Owner/Author.
PY - 2021/5/31
Y1 - 2021/5/31
N2 - Application programs that exhibit strong locality of reference lead to minimized cache misses and better performance in different architectures. In this paper, we target task-based programs, and propose a novel compiler-based approach that consists of four complementary steps. First, we partition the original tasks in the target application into sub-tasks and build a data reuse graph at a sub-task granularity. Second, based on the intensity of temporal and spatial data reuses among sub-tasks, we generate new tasks where each such (new) task includes a set of sub-tasks that exhibit high data reuse among them. Third, we assign the newly-generated tasks to cores in an architecture-aware fashion with the knowledge of data location. Finally, we re-schedule the execution order of sub-tasks within new tasks such that sub-tasks that belong to different tasks but share data among them are executed in close proximity in time. The experiments show that, when targeting a state of the art manycore system, our compiler-based approach improves the performance of 10 multithreaded programs by 23.4% on average.
AB - Application programs that exhibit strong locality of reference lead to minimized cache misses and better performance in different architectures. In this paper, we target task-based programs, and propose a novel compiler-based approach that consists of four complementary steps. First, we partition the original tasks in the target application into sub-tasks and build a data reuse graph at a sub-task granularity. Second, based on the intensity of temporal and spatial data reuses among sub-tasks, we generate new tasks where each such (new) task includes a set of sub-tasks that exhibit high data reuse among them. Third, we assign the newly-generated tasks to cores in an architecture-aware fashion with the knowledge of data location. Finally, we re-schedule the execution order of sub-tasks within new tasks such that sub-tasks that belong to different tasks but share data among them are executed in close proximity in time. The experiments show that, when targeting a state of the art manycore system, our compiler-based approach improves the performance of 10 multithreaded programs by 23.4% on average.
UR - http://www.scopus.com/inward/record.url?scp=85108531802&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85108531802&partnerID=8YFLogxK
U2 - 10.1145/3410220.3460103
DO - 10.1145/3410220.3460103
M3 - Conference contribution
AN - SCOPUS:85108531802
T3 - SIGMETRICS 2021 - Abstract Proceedings of the 2021 ACM SIGMETRICS / International Conference on Measurement and Modeling of Computer Systems
SP - 47
EP - 48
BT - SIGMETRICS 2021 - Abstract Proceedings of the 2021 ACM SIGMETRICS / International Conference on Measurement and Modeling of Computer Systems
PB - Association for Computing Machinery, Inc
Y2 - 14 June 2021 through 18 June 2021
ER -