Scalable and Consistent Graph Neural Networks for Distributed Mesh-based Data-driven Modeling

Shivam Barwey, Riccardo Balin, Bethany Lusch, Saumil Patel, Ramesh Balakrishnan, Pinaki Pal, Romit Maulik, Venkatram Vishwanath

Research output: Chapter in Book/Report/Conference proceedingConference contribution

3 Scopus citations

Abstract

This work develops a distributed graph neural network (GNN) methodology for mesh-based modeling applications using a consistent neural message passing layer. As the name implies, the focus is on enabling scalable operations that satisfy physical consistency via halo nodes at sub-graph boundaries. Here, consistency refers to the fact that a GNN trained and evaluated on one rank (one large graph) is arithmetically equivalent to evaluations on multiple ranks (a partitioned graph). This concept is demonstrated by interfacing GNNs with NekRS, a GPU-capable exascale CFD solver developed at Argonne National Laboratory. It is shown how the NekRS mesh partitioning can be linked to the distributed GNN training and inference routines, resulting in a scalable mesh-based data-driven modeling workflow. We study the impact of consistency on the scalability of mesh-based GNNs, demonstrating efficient scaling in consistent GNNs for up to O(1B) graph nodes on the Frontier exascale supercomputer.

Original languageEnglish (US)
Title of host publicationProceedings of SC 2024-W
Subtitle of host publicationWorkshops of the International Conference for High Performance Computing, Networking, Storage and Analysis
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages1058-1070
Number of pages13
ISBN (Electronic)9798350355543
DOIs
StatePublished - 2024
Event2024 Workshops of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC Workshops 2024 - Atlanta, United States
Duration: Nov 17 2024Nov 22 2024

Publication series

NameProceedings of SC 2024-W: Workshops of the International Conference for High Performance Computing, Networking, Storage and Analysis

Conference

Conference2024 Workshops of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC Workshops 2024
Country/TerritoryUnited States
CityAtlanta
Period11/17/2411/22/24

All Science Journal Classification (ASJC) codes

  • Information Systems
  • Software
  • Modeling and Simulation
  • Artificial Intelligence
  • Computer Networks and Communications
  • Computer Science Applications
  • Hardware and Architecture

Fingerprint

Dive into the research topics of 'Scalable and Consistent Graph Neural Networks for Distributed Mesh-based Data-driven Modeling'. Together they form a unique fingerprint.

Cite this