TY - GEN
T1 - Compiler support for near data computing
AU - Kandemir, Mahmut Taylan
AU - Ryoo, Jihyun
AU - Tang, Xulong
AU - Karakoy, Mustafa
N1 - Publisher Copyright:
© 2021 ACM.
PY - 2021/2/17
Y1 - 2021/2/17
N2 - Recent works from both hardware and software domains offer various optimizations that try to take advantage of near data computing (NDC) opportunities. While the results from these works indicate performance improvements of various magnitudes, the existing literature lacks a detailed quantification of the potential of NDC and analysis of compiler optimizations on tapping into that potential. This paper first presents an analysis of the NDC potential when executing multithreaded applications on manycore platforms. It then presents two compiler schemes designed to take advantage of NDC. The first of these schemes try to increase the amount of computation that can be performed in a hardware component, whereas the second compiler strategy strikes a balance between optimizing NDC and exploiting data reuse, by being more selective on when to perform NDC (even if the opportunity presents itself) and how. The collected experimental results on a 5×5 manycore system reveal that our first and second compiler schemes improve the overall performance of our multithreaded applications by, respectively, 22.5% and 25.2%, on average. Furthermore, these two compiler schemes are only 6.8% and 4.1% worse than an oracle scheme that makes the best near data computing decisions for each and every computation.
AB - Recent works from both hardware and software domains offer various optimizations that try to take advantage of near data computing (NDC) opportunities. While the results from these works indicate performance improvements of various magnitudes, the existing literature lacks a detailed quantification of the potential of NDC and analysis of compiler optimizations on tapping into that potential. This paper first presents an analysis of the NDC potential when executing multithreaded applications on manycore platforms. It then presents two compiler schemes designed to take advantage of NDC. The first of these schemes try to increase the amount of computation that can be performed in a hardware component, whereas the second compiler strategy strikes a balance between optimizing NDC and exploiting data reuse, by being more selective on when to perform NDC (even if the opportunity presents itself) and how. The collected experimental results on a 5×5 manycore system reveal that our first and second compiler schemes improve the overall performance of our multithreaded applications by, respectively, 22.5% and 25.2%, on average. Furthermore, these two compiler schemes are only 6.8% and 4.1% worse than an oracle scheme that makes the best near data computing decisions for each and every computation.
UR - http://www.scopus.com/inward/record.url?scp=85101686871&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85101686871&partnerID=8YFLogxK
U2 - 10.1145/3437801.3441600
DO - 10.1145/3437801.3441600
M3 - Conference contribution
AN - SCOPUS:85101686871
T3 - Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPOPP
SP - 90
EP - 104
BT - PPoPP 2021 - Proceedings of the 2021 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
PB - Association for Computing Machinery
T2 - 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP 2021
Y2 - 27 February 2021 through 3 March 2021
ER -