TY - GEN
T1 - Bandwidth constrained coordinated HW/SW prefetching for multicores
AU - Muralidhara, Sai Prashanth
AU - Kandemir, Mahmut
AU - Zhang, Yuanrui
PY - 2011
Y1 - 2011
N2 - Prefetching is a highly effective latency hiding technique that can greatly improve application performance. However, aggressive prefetching can potentially stress the off-chip bandwidth. The resulting bandwidth stalls can potentially negate the performance gain due to prefetching. In this paper, focusing on a multicore environment, we first study the comparative benefits of hardware and software prefetching and analyze if the two are complimentary or redundant. This analysis also evaluates different aggressiveness levels of hardware prefetching. Secondly, we weigh the positive performance benefits of prefetching against the negative performance effects of bandwidth stalls. Thirdly, we propose a hierarchical prefetch management scheme for multicores that controls the prefetch levels such that the overall performance gain is improved. Lastly, we show that our proposed off-chip bandwidth aware prefetch management scheme is very effective in practice, leading to performance gains of upto about 10% in system throughput over a bandwidth agnostic prefetching scheme.
AB - Prefetching is a highly effective latency hiding technique that can greatly improve application performance. However, aggressive prefetching can potentially stress the off-chip bandwidth. The resulting bandwidth stalls can potentially negate the performance gain due to prefetching. In this paper, focusing on a multicore environment, we first study the comparative benefits of hardware and software prefetching and analyze if the two are complimentary or redundant. This analysis also evaluates different aggressiveness levels of hardware prefetching. Secondly, we weigh the positive performance benefits of prefetching against the negative performance effects of bandwidth stalls. Thirdly, we propose a hierarchical prefetch management scheme for multicores that controls the prefetch levels such that the overall performance gain is improved. Lastly, we show that our proposed off-chip bandwidth aware prefetch management scheme is very effective in practice, leading to performance gains of upto about 10% in system throughput over a bandwidth agnostic prefetching scheme.
UR - http://www.scopus.com/inward/record.url?scp=80052348873&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=80052348873&partnerID=8YFLogxK
U2 - 10.1007/978-3-642-23400-2_29
DO - 10.1007/978-3-642-23400-2_29
M3 - Conference contribution
AN - SCOPUS:80052348873
SN - 9783642233999
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 310
EP - 325
BT - Euro-Par 2011 Parallel Processing - 17th International Conference, Proceedings
T2 - 17th International Conference on Parallel Processing, Euro-Par 2011
Y2 - 29 August 2011 through 2 September 2011
ER -