TY - GEN
T1 - Efficient Joint Communication and Computation Placement for Large-scale SNN Simulation on Supercomputers
AU - Bao, Yubing
AU - Lu, Zhihui
AU - Du, Xin
AU - Duan, Qiang
AU - Yang, Jirui
AU - Zhao, Jin
AU - Min, Geyong
AU - Chen, Yang
AU - Hu, Shijing
AU - Wang, Xin
N1 - Publisher Copyright:
© 2025 IEEE.
PY - 2025
Y1 - 2025
N2 - Spiking Neural Network (SNN) simulation involves emulating the activation and firing of spiking neurons on hardware platforms. This is a highly time-sensitive task, requiring the simulation of billions of neurons and their intercommunication within a few milliseconds. Each neuron performs a complex, interdependent multi-stage communication and computation task. We consider the task placement of SNN on supercomputers to accelerate SNN simulation. Existing task placement methods for SNN simulations have two major limitations. First, they lack the capability to handle large-scale SNNs with billions of neurons. Second, they focus primarily on optimizing communication delay, while neglecting multi-stage computation delays in SNN simulations. In this paper, we formalize the SNN Joint Multi-stage Communication and Computation Placement (SJCCP) problem. We demonstrate that SJCCP can be solved using an approximation algorithm with an approximation ratio of O(k2 →log n log k), where n is the number of voxels in the SNN and k is the number of GPUs. To further reduce the time complexity of solving SJCCP in practice, we propose a novel efficient framework, FastSJP, tailored for large-scale SNN placement. Then we apply the FastSJP framework to a human brain simulation that runs a large-scale SNN model derived from authentic biological data on a supercomputer equipped with 1024 GPUs. Experimental results verify that our framework notably reduces time overhead, ranging from 17.31% to 28.45%, compared to state-of-the-art methods. Leveraging the computational power of the supercomputer, FastSJP maximizes the problem size and processing performance, significantly advancing the development of brain-inspired intelligence.
AB - Spiking Neural Network (SNN) simulation involves emulating the activation and firing of spiking neurons on hardware platforms. This is a highly time-sensitive task, requiring the simulation of billions of neurons and their intercommunication within a few milliseconds. Each neuron performs a complex, interdependent multi-stage communication and computation task. We consider the task placement of SNN on supercomputers to accelerate SNN simulation. Existing task placement methods for SNN simulations have two major limitations. First, they lack the capability to handle large-scale SNNs with billions of neurons. Second, they focus primarily on optimizing communication delay, while neglecting multi-stage computation delays in SNN simulations. In this paper, we formalize the SNN Joint Multi-stage Communication and Computation Placement (SJCCP) problem. We demonstrate that SJCCP can be solved using an approximation algorithm with an approximation ratio of O(k2 →log n log k), where n is the number of voxels in the SNN and k is the number of GPUs. To further reduce the time complexity of solving SJCCP in practice, we propose a novel efficient framework, FastSJP, tailored for large-scale SNN placement. Then we apply the FastSJP framework to a human brain simulation that runs a large-scale SNN model derived from authentic biological data on a supercomputer equipped with 1024 GPUs. Experimental results verify that our framework notably reduces time overhead, ranging from 17.31% to 28.45%, compared to state-of-the-art methods. Leveraging the computational power of the supercomputer, FastSJP maximizes the problem size and processing performance, significantly advancing the development of brain-inspired intelligence.
UR - https://www.scopus.com/pages/publications/105019749839
UR - https://www.scopus.com/pages/publications/105019749839#tab=citedBy
U2 - 10.1109/ICDCS63083.2025.00035
DO - 10.1109/ICDCS63083.2025.00035
M3 - Conference contribution
AN - SCOPUS:105019749839
T3 - Proceedings - International Conference on Distributed Computing Systems
SP - 275
EP - 285
BT - Proceedings - 2025 IEEE 45th International Conference on Distributed Computing Systems, ICDCS 2025
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 45th IEEE International Conference on Distributed Computing Systems, ICDCS 2025
Y2 - 20 July 2025 through 23 July 2025
ER -