TY - GEN
T1 - Performance tuning of the Helmholtz matrix-vector product kernel in the computational fluid dynamics solver Nek5000/RS for the A64FX processor
AU - Tsuji, Miwako
AU - Min, Misun
AU - Kerkemeier, Stefan
AU - Fischer, Paul
AU - Merzari, Elia
AU - Sato, Mitsuhisa
N1 - Funding Information:
This research used computational resources of the supercomputer Fugaku provided by the RIKEN Center for Computational Science. This research is partially supported by the U.S. Department of Energy, Office of Science, under contract DE-AC02-06CH11357 and by the Exascale Computing Project (17-SC-20-SC).
Publisher Copyright:
© 2022 ACM.
PY - 2022/1/11
Y1 - 2022/1/11
N2 - Nek5000/RS is an open source computational fluid dynamics solver based on the spectral element method. One of the important kernel of the Nek5000/RS is called "axhelm", which computes the Helmholtz matrix-vector product. In this paper, we have evaluated the axhelm kernel on the A64FX processor for the simplest case of polynomial degree N = 7. We have optimized the kernel for the A64FX processor by using well known optimization techniques such as SIMDization, software pipelining, continuous access enhancing, and software prefetch. We also provide the performance analysis data to investigate the effects of the optimization techniques to help understanding the A64FX processor and the Fujitsu compiler.
AB - Nek5000/RS is an open source computational fluid dynamics solver based on the spectral element method. One of the important kernel of the Nek5000/RS is called "axhelm", which computes the Helmholtz matrix-vector product. In this paper, we have evaluated the axhelm kernel on the A64FX processor for the simplest case of polynomial degree N = 7. We have optimized the kernel for the A64FX processor by using well known optimization techniques such as SIMDization, software pipelining, continuous access enhancing, and software prefetch. We also provide the performance analysis data to investigate the effects of the optimization techniques to help understanding the A64FX processor and the Fujitsu compiler.
UR - http://www.scopus.com/inward/record.url?scp=85124049991&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85124049991&partnerID=8YFLogxK
U2 - 10.1145/3503470.3503476
DO - 10.1145/3503470.3503476
M3 - Conference contribution
AN - SCOPUS:85124049991
T3 - ACM International Conference Proceeding Series
SP - 49
EP - 59
BT - Proceedings of International Conference on High Performance Computing in Asia-Pacific Region Workshops, HPCAsia 2022
PB - Association for Computing Machinery
T2 - 2022 International Conference on High Performance Computing in Asia-Pacific Region Workshops, HPCAsia 2022
Y2 - 11 January 2022 through 14 January 2022
ER -