TY - JOUR
T1 - A Matrix-Based Approach to Global Locality Optimization
AU - Kandemir, Mahmut
AU - Choudhary, Alok
AU - Ramanujam, J.
AU - Banerjee, Prith
N1 - Funding Information:
We thank the anonymous referees for their insightful comments and suggestions. The material presented in this paper is based on research supported in part by Alok Choudhary’s NSF Young Investigator Award CCR-9357840, NSF Grants CCR-9509143 and CCR-9796029, the Air Force Materials Command under Contract F30602-97-C-0026, and the Department of Energy under the ASCI Academic Strategic Alliance Program Level 2, under Subcontract W-7405-ENG-48 from Lawrence Livermore Labs. The work of Prith Banerjee is supported in part by the NSF under Grant CCR-9526325 and in part by the DARPA under contract F30602-98-0144. The work of J. Ramanujam is supported in part by an NSF Young Investigator Award CCR-9457768.
PY - 1999/8
Y1 - 1999/8
N2 - Global locality optimization is a technique for improving the cache performance of a sequence of loop nests through a combination of loop and data layout transformations. Pure loop transformations are restricted by data dependencies and may not be very successful in optimizing imperfectly nested loops and explicitly parallelized programs. Although pure data transformations are not constrained by data dependencies, the impact of a data transformation on an array might be program-wide; that is, it can affect all the references to that array in all the loop nests. Therefore, in this paper we argue for an integrated approach that employs both loop and data transformations. The method enjoys the advantages of most of the previous techniques for enhancing locality and is efficient. In our approach, the loop nests in a program are processed one by one and the data layout constraints obtained from one nest are propagated for optimizing the remaining loop nests. We show a simple and effective matrix-based framework to implement this process. The search space that we consider for possible loop transformations can be represented by general nonsingular linear transformation matrices and the data layouts that we consider are those that can be expressed using hyperplanes. Experiments with several floating-point programs on an 8-processor SGI Origin 2000 distributed-shared-memory machine demonstrate the efficacy of our approach.
AB - Global locality optimization is a technique for improving the cache performance of a sequence of loop nests through a combination of loop and data layout transformations. Pure loop transformations are restricted by data dependencies and may not be very successful in optimizing imperfectly nested loops and explicitly parallelized programs. Although pure data transformations are not constrained by data dependencies, the impact of a data transformation on an array might be program-wide; that is, it can affect all the references to that array in all the loop nests. Therefore, in this paper we argue for an integrated approach that employs both loop and data transformations. The method enjoys the advantages of most of the previous techniques for enhancing locality and is efficient. In our approach, the loop nests in a program are processed one by one and the data layout constraints obtained from one nest are propagated for optimizing the remaining loop nests. We show a simple and effective matrix-based framework to implement this process. The search space that we consider for possible loop transformations can be represented by general nonsingular linear transformation matrices and the data layouts that we consider are those that can be expressed using hyperplanes. Experiments with several floating-point programs on an 8-processor SGI Origin 2000 distributed-shared-memory machine demonstrate the efficacy of our approach.
UR - http://www.scopus.com/inward/record.url?scp=0008323676&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=0008323676&partnerID=8YFLogxK
U2 - 10.1006/jpdc.1999.1552
DO - 10.1006/jpdc.1999.1552
M3 - Article
AN - SCOPUS:0008323676
SN - 0743-7315
VL - 58
SP - 190
EP - 235
JO - Journal of Parallel and Distributed Computing
JF - Journal of Parallel and Distributed Computing
IS - 2
ER -