A unified framework for optimizing locality, parallelism, and communication in out-of-core computations

Mahmut Kandemir, Alok Choudhary, J. Ramanujam, Meenakshi A. Kandaswamy

Research output: Contribution to journalArticlepeer-review

17 Scopus citations

Abstract

This paper presents a unified framework that optimizes out-of-core programs by exploiting locality and parallelism, and reducing communication overhead. For out-of-core problems where the data set sizes far exceed the size of the available in-core memory, it is particularly important to exploit the memory hierarchy by optimizing the I/O accesses. We present algorithms that consider both iteration space (loop) and data space (file layout) transformations in a unified framework. We show that the performance of an out-of-core loop nest containing references to out-of-core arrays can be improved by using a suitable combination of file layout choices and loop restructuring transformations. Our approach considers array references one-by-one and attempts to optimize each reference for parallelism and locality. When there are references for which parallelism optimizations do not work, communication is vectorized so that data transfer can be performed before the innermost loop. Results from hand-compiles on IBM SP-2 and Intel Paragon distributed-memory message-passing architectures show that this approach reduces the execution times and improves the overall speedups. In addition, we extend the base algorithm to work with file layout constraints and show how it is useful for optimizing programs that consist of multiple loop nests.

Original languageEnglish (US)
Pages (from-to)648-668
Number of pages21
JournalIEEE Transactions on Parallel and Distributed Systems
Volume11
Issue number7
DOIs
StatePublished - 2000

All Science Journal Classification (ASJC) codes

  • Signal Processing
  • Hardware and Architecture
  • Computational Theory and Mathematics

Fingerprint

Dive into the research topics of 'A unified framework for optimizing locality, parallelism, and communication in out-of-core computations'. Together they form a unique fingerprint.

Cite this