Improving whole-program locality using intra-procedural and inter-procedural transformations

Research output: Contribution to journalArticlepeer-review

4 Scopus citations


Exploiting spatial and temporal locality is essential for obtaining high performance on modern computers. Writing programs that exhibit high locality of reference is difficult and error-prone. Compiler researchers have developed loop transformations that allow the conversion of programs to exploit locality. Recently, transformations that change the memory layouts of multi-dimensional arrays - called data transformations - have been proposed. Unfortunately, both data and loop transformations have some important drawbacks. In this work, we present an integrated framework that uses loop and data transformations in concert to exploit the benefits of both approaches while minimizing the impact of their disadvantages. Our approach works inter-procedurally on acyclic call graphs, uses profile data to eliminate layout conflicts, and is unique in its capability of resolving conflicting layout requirements of different references to the same array in the same nest and in different nests for regular array-based applications. The optimization technique presented in this paper has been implemented in a source-to-source translator. We evaluate its performance using standard benchmark suites and several math libraries (complete programs) with large input sizes. Our experimental results show that the proposed approach improves the performance of the applications optimized by using the current state-of-the-art techniques by 8.2% on the average. This reduction comes from three important characteristics of the technique, namely, resolving layout conflicts between references to the same array in a loop nest, determining a suitable order to propagate layout modifications across loop nests, and propagating layouts between different procedures in the program - all in a unified framework. The locality optimization technique presented in this paper tries to exploit locality in the innermost loop positions. This strategy, in most cases, generates dependence-free outer loops, which can be safely parallelized.

Original languageEnglish (US)
Pages (from-to)564-582
Number of pages19
JournalJournal of Parallel and Distributed Computing
Issue number5
StatePublished - May 2005

All Science Journal Classification (ASJC) codes

  • Software
  • Theoretical Computer Science
  • Hardware and Architecture
  • Computer Networks and Communications
  • Artificial Intelligence


Dive into the research topics of 'Improving whole-program locality using intra-procedural and inter-procedural transformations'. Together they form a unique fingerprint.

Cite this