TY - GEN
T1 - Meeting midway
T2 - 22nd International Conference on Parallel Architectures and Compilation Techniques, PACT 2013
AU - Yedlapalli, Praveen
AU - Kotra, Jagadish
AU - Kultursay, Emre
AU - Kandemir, Mahmut
AU - Das, Chita R.
AU - Sivasubramaniam, Anand
PY - 2013
Y1 - 2013
N2 - Both on-chip resource contention and off-chip latencies have a significant impact on memory requests in large-scale chip multiprocessors. We propose a memory-side prefetcher, which brings data on-chip from DRAM, but does not proactively further push this data to the cores/caches. Sitting close to memory, it avails close knowledge of DRAM state and memory channels to leverage DRAM row buffer locality and channel state to bring data (from the current row buffer) on-chip ahead of need. This not only reduces the number of off-chip accesses for demand requests, but also reduces row buffer conflicts, effectively improving DRAM access times. At the same time, our prefetcher maintains this data in a small buffer at each memory controller instead of pushing it into the caches to avoid on-chip resource contention. We show that the proposed memory-side prefetcher outperforms a state-of-the-art core-side prefetcher and an existing memory-side prefetcher. More importantly, our prefetcher can also work in tandem with the core-side prefetcher to amplify the benefits. Using a wide range of multiprogrammed and multi-threaded workloads, we show that this memory-side prefetcher provides IPC improvements of 6.2% (maximum of 33.6%), and 10% (maximum of 49.6%), on an average when running alone and when combined with a core-side prefetcher, respectively. By meeting requests midway, our solution reduces the off-chip latencies while avoiding the on-chip resource contention caused by inaccurate and ill-timed prefetches.
AB - Both on-chip resource contention and off-chip latencies have a significant impact on memory requests in large-scale chip multiprocessors. We propose a memory-side prefetcher, which brings data on-chip from DRAM, but does not proactively further push this data to the cores/caches. Sitting close to memory, it avails close knowledge of DRAM state and memory channels to leverage DRAM row buffer locality and channel state to bring data (from the current row buffer) on-chip ahead of need. This not only reduces the number of off-chip accesses for demand requests, but also reduces row buffer conflicts, effectively improving DRAM access times. At the same time, our prefetcher maintains this data in a small buffer at each memory controller instead of pushing it into the caches to avoid on-chip resource contention. We show that the proposed memory-side prefetcher outperforms a state-of-the-art core-side prefetcher and an existing memory-side prefetcher. More importantly, our prefetcher can also work in tandem with the core-side prefetcher to amplify the benefits. Using a wide range of multiprogrammed and multi-threaded workloads, we show that this memory-side prefetcher provides IPC improvements of 6.2% (maximum of 33.6%), and 10% (maximum of 49.6%), on an average when running alone and when combined with a core-side prefetcher, respectively. By meeting requests midway, our solution reduces the off-chip latencies while avoiding the on-chip resource contention caused by inaccurate and ill-timed prefetches.
UR - http://www.scopus.com/inward/record.url?scp=84887501415&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84887501415&partnerID=8YFLogxK
U2 - 10.1109/PACT.2013.6618825
DO - 10.1109/PACT.2013.6618825
M3 - Conference contribution
AN - SCOPUS:84887501415
SN - 9781479910212
T3 - Parallel Architectures and Compilation Techniques - Conference Proceedings, PACT
SP - 289
EP - 298
BT - PACT 2013 - Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques
Y2 - 7 September 2013 through 11 September 2013
ER -