TY - JOUR
T1 - Mix and Match
T2 - Reorganizing Tasks for Enhancing Data Locality
AU - Tang, Xulong
AU - Kandemir, Mahmut Taylan
AU - Karakoy, Mustafa
N1 - Funding Information:
The authors thank Dr. Evgenia Smirni for shepherding the paper. This work is supported in part by NSF grants #1908793, #1629915, #1629129, #1763681, #2028929, #2008398, #2011146, and #1931531, as well as a startup funding from the University of Pittsburgh.
Publisher Copyright:
© 2021 Owner/Author.
PY - 2021/6
Y1 - 2021/6
N2 - Application programs that exhibit strong locality of reference lead to minimized cache misses and better performance in different architectures. In this paper, we target task-based programs, and propose a novel compiler-based approach that consists of four complementary steps. First, we partition the original tasks in the target application into sub-Tasks and build a data reuse graph at a sub-Task granularity. Second, based on the intensity of temporal and spatial data reuses among sub-Tasks, we generate new tasks where each such (new) task includes a set of sub-Tasks that exhibit high data reuse among them. Third, we assign the newly-generated tasks to cores in an architecture-Aware fashion with the knowledge of data location. Finally, we re-schedule the execution order of sub-Tasks within new tasks such that sub-Tasks that belong to different tasks but share data among them are executed in close proximity in time. The experiments show that, when targeting a state of the art manycore system, our compiler-based approach improves the performance of 10 multithreaded programs by 23.4% on average.
AB - Application programs that exhibit strong locality of reference lead to minimized cache misses and better performance in different architectures. In this paper, we target task-based programs, and propose a novel compiler-based approach that consists of four complementary steps. First, we partition the original tasks in the target application into sub-Tasks and build a data reuse graph at a sub-Task granularity. Second, based on the intensity of temporal and spatial data reuses among sub-Tasks, we generate new tasks where each such (new) task includes a set of sub-Tasks that exhibit high data reuse among them. Third, we assign the newly-generated tasks to cores in an architecture-Aware fashion with the knowledge of data location. Finally, we re-schedule the execution order of sub-Tasks within new tasks such that sub-Tasks that belong to different tasks but share data among them are executed in close proximity in time. The experiments show that, when targeting a state of the art manycore system, our compiler-based approach improves the performance of 10 multithreaded programs by 23.4% on average.
UR - http://www.scopus.com/inward/record.url?scp=85131743603&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85131743603&partnerID=8YFLogxK
U2 - 10.1145/3410220.3460103
DO - 10.1145/3410220.3460103
M3 - Article
AN - SCOPUS:85131743603
SN - 0163-5999
VL - 49
SP - 47
EP - 48
JO - Performance Evaluation Review
JF - Performance Evaluation Review
IS - 1
ER -