TY - GEN
T1 - Reshaping cache misses to improve row-buffer locality in multicore systems
AU - Ding, Wei
AU - Liu, Jun
AU - Kandemir, Mahmut
AU - Irwin, Mary Jane
PY - 2013
Y1 - 2013
N2 - Optimizing cache locality has always been important since the emergence of caches, and numerous cache locality optimization schemes have been published in compiler literature. However, in modern architectures, cache locality is not the only factor that determines memory system performance. Many emerging multicores employ banked memory systems and each bank is attached a row-buffer that holds the most-recently accessed memory row (page). A last-level cache miss that also misses in the row-buffer can experience much higher latency than a cache miss that hits in the row-buffer. Consequently, optimizing for row-buffer locality can be as important as optimizing for cache locality. Targeting emerging multicores and multithreaded applications, this paper presents a compiler-directed row-buffer locality optimization strategy. This strategy modifies the memory layout of data to increase the number of row-buffer hits without increasing the number of misses in the on-chip cache hierarchy. We implemented our proposed optimization strategy in an open-source compiler and tested its effectiveness in improving the row-buffer performance using a set of multithreaded applications. Our results indicate that the proposed approach improves the average data access latency by about 29%, and this translates, on average, to about 15% improvement in execution time.
AB - Optimizing cache locality has always been important since the emergence of caches, and numerous cache locality optimization schemes have been published in compiler literature. However, in modern architectures, cache locality is not the only factor that determines memory system performance. Many emerging multicores employ banked memory systems and each bank is attached a row-buffer that holds the most-recently accessed memory row (page). A last-level cache miss that also misses in the row-buffer can experience much higher latency than a cache miss that hits in the row-buffer. Consequently, optimizing for row-buffer locality can be as important as optimizing for cache locality. Targeting emerging multicores and multithreaded applications, this paper presents a compiler-directed row-buffer locality optimization strategy. This strategy modifies the memory layout of data to increase the number of row-buffer hits without increasing the number of misses in the on-chip cache hierarchy. We implemented our proposed optimization strategy in an open-source compiler and tested its effectiveness in improving the row-buffer performance using a set of multithreaded applications. Our results indicate that the proposed approach improves the average data access latency by about 29%, and this translates, on average, to about 15% improvement in execution time.
UR - http://www.scopus.com/inward/record.url?scp=84887455704&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84887455704&partnerID=8YFLogxK
U2 - 10.1109/PACT.2013.6618820
DO - 10.1109/PACT.2013.6618820
M3 - Conference contribution
AN - SCOPUS:84887455704
SN - 9781479910212
T3 - Parallel Architectures and Compilation Techniques - Conference Proceedings, PACT
SP - 235
EP - 244
BT - PACT 2013 - Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques
T2 - 22nd International Conference on Parallel Architectures and Compilation Techniques, PACT 2013
Y2 - 7 September 2013 through 11 September 2013
ER -