TY - GEN
T1 - Scalable and Consistent Graph Neural Networks for Distributed Mesh-based Data-driven Modeling
AU - Barwey, Shivam
AU - Balin, Riccardo
AU - Lusch, Bethany
AU - Patel, Saumil
AU - Balakrishnan, Ramesh
AU - Pal, Pinaki
AU - Maulik, Romit
AU - Vishwanath, Venkatram
N1 - Publisher Copyright:
© 2024 IEEE.
PY - 2024
Y1 - 2024
N2 - This work develops a distributed graph neural network (GNN) methodology for mesh-based modeling applications using a consistent neural message passing layer. As the name implies, the focus is on enabling scalable operations that satisfy physical consistency via halo nodes at sub-graph boundaries. Here, consistency refers to the fact that a GNN trained and evaluated on one rank (one large graph) is arithmetically equivalent to evaluations on multiple ranks (a partitioned graph). This concept is demonstrated by interfacing GNNs with NekRS, a GPU-capable exascale CFD solver developed at Argonne National Laboratory. It is shown how the NekRS mesh partitioning can be linked to the distributed GNN training and inference routines, resulting in a scalable mesh-based data-driven modeling workflow. We study the impact of consistency on the scalability of mesh-based GNNs, demonstrating efficient scaling in consistent GNNs for up to O(1B) graph nodes on the Frontier exascale supercomputer.
AB - This work develops a distributed graph neural network (GNN) methodology for mesh-based modeling applications using a consistent neural message passing layer. As the name implies, the focus is on enabling scalable operations that satisfy physical consistency via halo nodes at sub-graph boundaries. Here, consistency refers to the fact that a GNN trained and evaluated on one rank (one large graph) is arithmetically equivalent to evaluations on multiple ranks (a partitioned graph). This concept is demonstrated by interfacing GNNs with NekRS, a GPU-capable exascale CFD solver developed at Argonne National Laboratory. It is shown how the NekRS mesh partitioning can be linked to the distributed GNN training and inference routines, resulting in a scalable mesh-based data-driven modeling workflow. We study the impact of consistency on the scalability of mesh-based GNNs, demonstrating efficient scaling in consistent GNNs for up to O(1B) graph nodes on the Frontier exascale supercomputer.
UR - https://www.scopus.com/pages/publications/85217173690
UR - https://www.scopus.com/inward/citedby.url?scp=85217173690&partnerID=8YFLogxK
U2 - 10.1109/SCW63240.2024.00146
DO - 10.1109/SCW63240.2024.00146
M3 - Conference contribution
AN - SCOPUS:85217173690
T3 - Proceedings of SC 2024-W: Workshops of the International Conference for High Performance Computing, Networking, Storage and Analysis
SP - 1058
EP - 1070
BT - Proceedings of SC 2024-W
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2024 Workshops of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC Workshops 2024
Y2 - 17 November 2024 through 22 November 2024
ER -