TY - JOUR
T1 - Compiler-Directed Scratch Pad Memory Optimization for Embedded Multiprocessors
AU - Kandemir, Mahmut
AU - Kadayif, Ismail
AU - Choudhary, Alok
AU - Ramanujam, J.
AU - Kolcu, Ibrahim
N1 - Funding Information:
Manuscript received March 7, 2003; revised June 25, 2003. This work was supported in part by National Science Foundation CAREER Award 0093082. M. Kandemir and I. Kadayif are with The Pennsylvania State University, University Park, PA 16802 USA (e-mail: [email protected]). A. Choudhary is with Northwestern University, Evanston, IL 60208-2300 USA. J. Ramanujam is with Louisiana State University, Baton Rouge, LA 70803-5901 USA. I. Kolcu is with the University of Manchester Institute of Science and Technology, Manchester M60 1QD, U.K. Digital Object Identifier 10.1109/TVLSI.2004.824299
PY - 2004/3
Y1 - 2004/3
N2 - This paper presents a compiler strategy to optimize data accesses in regular array-intensive applications running on embedded multiprocessor environments. Specifically, we propose an optimization algorithm that targets at reducing extra off-chip memory accesses caused by interprocessor communication. This is achieved by increasing the application-wide reuse of data that resides in scratch-pad memories of processors. Our results obtained using four array-intensive image processing applications indicate that exploiting interprocessor data sharing can reduce energy-delay product significantly on a four-processor embedded system.
AB - This paper presents a compiler strategy to optimize data accesses in regular array-intensive applications running on embedded multiprocessor environments. Specifically, we propose an optimization algorithm that targets at reducing extra off-chip memory accesses caused by interprocessor communication. This is achieved by increasing the application-wide reuse of data that resides in scratch-pad memories of processors. Our results obtained using four array-intensive image processing applications indicate that exploiting interprocessor data sharing can reduce energy-delay product significantly on a four-processor embedded system.
UR - http://www.scopus.com/inward/record.url?scp=2142707258&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=2142707258&partnerID=8YFLogxK
U2 - 10.1109/TVLSI.2004.824299
DO - 10.1109/TVLSI.2004.824299
M3 - Article
AN - SCOPUS:2142707258
SN - 1063-8210
VL - 12
SP - 281
EP - 287
JO - IEEE Transactions on Very Large Scale Integration (VLSI) Systems
JF - IEEE Transactions on Very Large Scale Integration (VLSI) Systems
IS - 3
ER -