In this paper, we present a compiler strategy to optimize data accesses in regular array-intensive applications running on embedded multiprocessor environments. Specifically, we propose an optimization algorithm that targets the reduction of extra off-chip memory accesses caused by inter-processor communication. This is achieved by increasing the application-wide reuse of data that resides in the scratch-pad memories of processors. Our experimental results obtained on four array-intensive image processing applications indicate that exploiting inter-processor data sharing can reduce the energy-delay product by as much as 33.8% (and 24.3% on average) on a four-processor embedded system. The results also show that the proposed strategy is robust in the sense that it gives consistently good results over a wide range of several architectural parameters.
All Science Journal Classification (ASJC) codes
- Hardware and Architecture
- Control and Systems Engineering