Abstract
One of the critical goals in code optimization for Multi-Processor-System-on-a-Chip (MPSoC) architectures is to minimize the number of off-chip memory accesses. This is because such accesses can be extremely costly from both performance and power angles. While conventional data locality optimization techniques can be used for improving data access pattern of each processor independently, such techniques usually do not consider locality for shared data. This paper proposes a strategy that reduces the number of off-chip references due to shared data. It achieves this goal by restructuring a parallelized application code in such a fashion that a given data block is accessed by parallel processors within the same time frame, so that its reuse is maximized while it is in the on-chip memory space. This tends to minimize the number of off-chip references since the accesses to a given data block are clustered within a short period of time during execution. Our approach employs a polyhedral tool that helps us isolate computations that manipulate a given data block. In order to test the effectiveness of our approach, we implemented it using a publicly-available compiler infrastructure and conducted experiments with twelve data-intensive embedded applications. Our results show that optimizing data locality for shared data elements is very useful in practice.
Original language | English (US) |
---|---|
Pages (from-to) | 1201-1214 |
Number of pages | 14 |
Journal | IEEE Transactions on Parallel and Distributed Systems |
Volume | 19 |
Issue number | 9 |
DOIs | |
State | Published - 2008 |
All Science Journal Classification (ASJC) codes
- Signal Processing
- Hardware and Architecture
- Computational Theory and Mathematics