TY - GEN
T1 - Slicing based code parallelization for minimizing inter-processor communication
AU - Kandemir, Mahmut
AU - Zhang, Yuanrui
AU - Muralidhara, Sai Prasanth
AU - Ozturk, Ozcan
AU - Narayanan, Sri Hari Krishna
N1 - Copyright:
Copyright 2012 Elsevier B.V., All rights reserved.
PY - 2009
Y1 - 2009
N2 - One of the critical problems in distributed memory multi-core architectures is scalable parallelization that minimizes inter-processor communication. Using the concept of iteration space slicing, this paper presents a new code parallelization scheme for data-intensive applications. This scheme targets distributed memory multi-core architectures, and formulates the problem of data-computation distribution (partitioning) across parallel processors using slicing such that, starting with the partitioning of the output arrays, it iteratively determines the partitions of other arrays as well as iteration spaces of the loop nests in the application code. The goal is to minimize inter-processor data communications. Based on this iteration space slicing based formulation of the problem, we also propose a solution scheme. The proposed data-computation scheme is evaluated using six data-intensive benchmark programs. In our experimental evaluation, we also compare this scheme against three alternate data-computation distribution schemes. The results obtained are very encouraging, indicating around 10% better speedup, with 16 processors, over the next-best scheme when averaged over all benchmark codes we tested.
AB - One of the critical problems in distributed memory multi-core architectures is scalable parallelization that minimizes inter-processor communication. Using the concept of iteration space slicing, this paper presents a new code parallelization scheme for data-intensive applications. This scheme targets distributed memory multi-core architectures, and formulates the problem of data-computation distribution (partitioning) across parallel processors using slicing such that, starting with the partitioning of the output arrays, it iteratively determines the partitions of other arrays as well as iteration spaces of the loop nests in the application code. The goal is to minimize inter-processor data communications. Based on this iteration space slicing based formulation of the problem, we also propose a solution scheme. The proposed data-computation scheme is evaluated using six data-intensive benchmark programs. In our experimental evaluation, we also compare this scheme against three alternate data-computation distribution schemes. The results obtained are very encouraging, indicating around 10% better speedup, with 16 processors, over the next-best scheme when averaged over all benchmark codes we tested.
UR - http://www.scopus.com/inward/record.url?scp=72049109210&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=72049109210&partnerID=8YFLogxK
U2 - 10.1145/1629395.1629409
DO - 10.1145/1629395.1629409
M3 - Conference contribution
AN - SCOPUS:72049109210
SN - 9781605586267
T3 - Embedded Systems Week 2009 - 2009 International Conference on Compilers, Architecture, and Synthesis for Embedded Systems, CASES'09
SP - 87
EP - 95
BT - Embedded Systems Week 2009 - 2009 International Conference on Compilers, Architecture, and Synthesis for Embedded Systems, CASES'09
T2 - Embedded Systems Week 2009, ESWEEK 2009 - 2009 International Conference on Compilers, Architecture, and Synthesis for Embedded Systems, CASES'09
Y2 - 11 October 2009 through 16 October 2009
ER -