TY - GEN
T1 - Reducing off-chip memory access costs using data recomputation in embedded chip multi-processors
AU - Koc, Hakduran
AU - Kandemir, Mahmut
AU - Ercanlit, Ehat
AU - Ozturk, Ozean
PY - 2007/8/2
Y1 - 2007/8/2
N2 - There have been numerous efforts on Scratch-Pad Memory (SPM) management in the context of single CPU systems and, more recently, multi-processor architectures. This paper presents a novel SPM space utilization strategy, for embedded chip multi-processor systems, based on recomputing the value of an off-chip data element using on-chip (SPM resident) data elements. In doing so, our goal is to eliminate the corresponding off-chip memory access that would otherwise be performed, and save execution cycles and power. This paper presents the details of a compiler algorithm that implements this approach and reports the experimental data we collected using six data-intensive applications. Our results indicate that, on a four processor chip multiprocessor, the average performance improvement our approach brings is about 11.8%, over a state-of-the-art SPM management scheme. We also observed that there is a specific range of total SPM size/total data size ratios, for which our approach generates the best results. Finally, our results also show that the proposed approach brings consistent improvements when the number of CPUs is varied between 2 and 16.
AB - There have been numerous efforts on Scratch-Pad Memory (SPM) management in the context of single CPU systems and, more recently, multi-processor architectures. This paper presents a novel SPM space utilization strategy, for embedded chip multi-processor systems, based on recomputing the value of an off-chip data element using on-chip (SPM resident) data elements. In doing so, our goal is to eliminate the corresponding off-chip memory access that would otherwise be performed, and save execution cycles and power. This paper presents the details of a compiler algorithm that implements this approach and reports the experimental data we collected using six data-intensive applications. Our results indicate that, on a four processor chip multiprocessor, the average performance improvement our approach brings is about 11.8%, over a state-of-the-art SPM management scheme. We also observed that there is a specific range of total SPM size/total data size ratios, for which our approach generates the best results. Finally, our results also show that the proposed approach brings consistent improvements when the number of CPUs is varied between 2 and 16.
UR - http://www.scopus.com/inward/record.url?scp=34547317963&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=34547317963&partnerID=8YFLogxK
U2 - 10.1109/DAC.2007.375157
DO - 10.1109/DAC.2007.375157
M3 - Conference contribution
AN - SCOPUS:34547317963
SN - 1595936270
SN - 9781595936271
T3 - Proceedings - Design Automation Conference
SP - 224
EP - 229
BT - 2007 44th ACM/IEEE Design Automation Conference, DAC'07
T2 - 2007 44th ACM/IEEE Design Automation Conference, DAC'07
Y2 - 4 June 2007 through 8 June 2007
ER -