TY - JOUR
T1 - Improved cache utilization and preconditioner efficiency through use of a space-filling curve mesh element- and vertex-reordering technique
AU - Sastry, Shankar P.
AU - Kultursay, Emre
AU - Shontz, Suzanne M.
AU - Kandemir, Mahmut T.
N1 - Publisher Copyright:
© 2014, Springer-Verlag London.
PY - 2014/10
Y1 - 2014/10
N2 - Solving partial differential equations using finite element (FE) methods for unstructured meshes that contain billions of elements is computationally a very challenging task. While parallel implementations can deliver a solution in a reasonable amount of time, they suffer from low cache utilization due to unstructured data access patterns. In this work, we reorder the way the mesh vertices and elements are stored in memory using Hilbert space-filling curves to improve cache utilization in FE methods for unstructured meshes. This reordering technique enumerates the mesh elements such that parallel threads access shared vertices at different time intervals, reducing the time wasted waiting to acquire locks guarding atomic regions. Further, when the linear system resulting from the FE analysis is solved using the preconditioned conjugate gradient method, the performance of the block-Jacobi preconditioner also improves, as more nonzeros are present near the stiffness matrix diagonal. Our results show that our reordering reduces the L1 and L2 cache miss-rates in the stiffness matrix assembly step by about 50 and 10 %, respectively, on a single-core processor. We also reduce the number of iterations required to solve the linear system by about 5 %. Overall, our reordering reduces the time to assemble the stiffness matrix and to solve the linear system on a 4-socket, 48-core multi-processor by about 20 %.
AB - Solving partial differential equations using finite element (FE) methods for unstructured meshes that contain billions of elements is computationally a very challenging task. While parallel implementations can deliver a solution in a reasonable amount of time, they suffer from low cache utilization due to unstructured data access patterns. In this work, we reorder the way the mesh vertices and elements are stored in memory using Hilbert space-filling curves to improve cache utilization in FE methods for unstructured meshes. This reordering technique enumerates the mesh elements such that parallel threads access shared vertices at different time intervals, reducing the time wasted waiting to acquire locks guarding atomic regions. Further, when the linear system resulting from the FE analysis is solved using the preconditioned conjugate gradient method, the performance of the block-Jacobi preconditioner also improves, as more nonzeros are present near the stiffness matrix diagonal. Our results show that our reordering reduces the L1 and L2 cache miss-rates in the stiffness matrix assembly step by about 50 and 10 %, respectively, on a single-core processor. We also reduce the number of iterations required to solve the linear system by about 5 %. Overall, our reordering reduces the time to assemble the stiffness matrix and to solve the linear system on a 4-socket, 48-core multi-processor by about 20 %.
UR - http://www.scopus.com/inward/record.url?scp=84920256691&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84920256691&partnerID=8YFLogxK
U2 - 10.1007/s00366-014-0363-0
DO - 10.1007/s00366-014-0363-0
M3 - Article
AN - SCOPUS:84920256691
SN - 0177-0667
VL - 30
SP - 535
EP - 547
JO - Engineering with Computers
JF - Engineering with Computers
IS - 4
ER -