TY - GEN
T1 - Performance and energy evaluation of data prefetching on intel Xeon Phi
AU - Guttman, Diana
AU - Kandemir, Mahmut Taylan
AU - Arunachalamy, Meenakshi
AU - Calina, Vlad
N1 - Publisher Copyright:
© 2015 IEEE.
PY - 2015/4/27
Y1 - 2015/4/27
N2 - There is an urgent need to evaluate the existing parallelism and data locality-oriented techniques on emerging manycore machines using multithreaded applications. Data prefetching is a well-known latency hiding technique that comes with various hardware- and software-based implementations in almost all commercial machines. A well-tuned prefetcher can reduce the observed data access latencies significantly by bringing the soonto- be-requested data into the cache ahead of time, eventually improving application execution time. Motivated by this, we present in this paper a detailed performance and power characterization of software (compiler-guided) and hardware data prefetching on an Intel Xeon Phi-based system. Our main contributions are (i) an analysis of the interactions between hardware and software prefetching, showing how hardware prefetching can throttle itself in response to software; (ii) results on the power and energy behavior of prefetching, showing how performance and energy gains outweigh the increased power cost of prefetching; and (iii) an evaluation of the use of intrinsic prefetch instructions to prefetch for applications with difficult-to-detect access patterns.
AB - There is an urgent need to evaluate the existing parallelism and data locality-oriented techniques on emerging manycore machines using multithreaded applications. Data prefetching is a well-known latency hiding technique that comes with various hardware- and software-based implementations in almost all commercial machines. A well-tuned prefetcher can reduce the observed data access latencies significantly by bringing the soonto- be-requested data into the cache ahead of time, eventually improving application execution time. Motivated by this, we present in this paper a detailed performance and power characterization of software (compiler-guided) and hardware data prefetching on an Intel Xeon Phi-based system. Our main contributions are (i) an analysis of the interactions between hardware and software prefetching, showing how hardware prefetching can throttle itself in response to software; (ii) results on the power and energy behavior of prefetching, showing how performance and energy gains outweigh the increased power cost of prefetching; and (iii) an evaluation of the use of intrinsic prefetch instructions to prefetch for applications with difficult-to-detect access patterns.
UR - http://www.scopus.com/inward/record.url?scp=84937485000&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84937485000&partnerID=8YFLogxK
U2 - 10.1109/ISPASS.2015.7095814
DO - 10.1109/ISPASS.2015.7095814
M3 - Conference contribution
AN - SCOPUS:84937485000
T3 - ISPASS 2015 - IEEE International Symposium on Performance Analysis of Systems and Software
SP - 288
EP - 297
BT - ISPASS 2015 - IEEE International Symposium on Performance Analysis of Systems and Software
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2015 15th IEEE International Symposium on Performance Analysis of Systems and Software, ISPASS 2015
Y2 - 29 March 2015 through 31 March 2015
ER -