TY - JOUR
T1 - HRCM
T2 - A Hierarchical Regularizing Mechanism for Sparse and Imbalanced Communication in Whole Human Brain Simulations
AU - Du, Xin
AU - Wang, Minglong
AU - Lu, Zhihui
AU - Duan, Qiang
AU - Liu, Yuhao
AU - Feng, Jianfeng
AU - Wang, Huarui
N1 - Publisher Copyright:
© 1990-2012 IEEE.
PY - 2024/6/1
Y1 - 2024/6/1
N2 - Brain simulation is one of the most important measures to understand how information is represented and processed in the brain, which usually needs to be realized in supercomputers with a large number of interconnected graphical processing units (GPUs). For the whole human brain simulation, tens of thousands of GPUs are utilized to simulate tens of billions of neurons and tens of trillions of synapses for the living brain to reveal functional connectivity patterns. However, as an application of the irregular spares communication problem on a large-scale system, the sparse and imbalanced communication patterns of the human brain make it particularly challenging to design a communication system for supporting large-scale brain simulations. To face this challenge, this paper proposes a hierarchical regularized communication mechanism, HRCM. The HRCM maintains a hierarchical virtual communication topology (HVCT) with a merge-forward algorithm that exploits the sparsity of neuron interactions to regularize inter-process communications in brain simulations. HRCM also provides a neuron-level partition scheme for assigning neurons to simulation processes to balance the communication load while improving resource utilization. In HRCM, neuron partition is formulated as a k-way graph partition problem and solved efficiently by the proposed hybrid multi-constraint greedy (HMCG) algorithm. HRCM has been implemented in human brain simulations at the scale of up to 86 billion neurons running on 10000 GPUs. Results obtained from extensive simulation experiments verify the effectiveness of HRCM in significantly reducing communication delay, increasing resource usage, and shortening simulation time for large-scale human brain models.
AB - Brain simulation is one of the most important measures to understand how information is represented and processed in the brain, which usually needs to be realized in supercomputers with a large number of interconnected graphical processing units (GPUs). For the whole human brain simulation, tens of thousands of GPUs are utilized to simulate tens of billions of neurons and tens of trillions of synapses for the living brain to reveal functional connectivity patterns. However, as an application of the irregular spares communication problem on a large-scale system, the sparse and imbalanced communication patterns of the human brain make it particularly challenging to design a communication system for supporting large-scale brain simulations. To face this challenge, this paper proposes a hierarchical regularized communication mechanism, HRCM. The HRCM maintains a hierarchical virtual communication topology (HVCT) with a merge-forward algorithm that exploits the sparsity of neuron interactions to regularize inter-process communications in brain simulations. HRCM also provides a neuron-level partition scheme for assigning neurons to simulation processes to balance the communication load while improving resource utilization. In HRCM, neuron partition is formulated as a k-way graph partition problem and solved efficiently by the proposed hybrid multi-constraint greedy (HMCG) algorithm. HRCM has been implemented in human brain simulations at the scale of up to 86 billion neurons running on 10000 GPUs. Results obtained from extensive simulation experiments verify the effectiveness of HRCM in significantly reducing communication delay, increasing resource usage, and shortening simulation time for large-scale human brain models.
UR - http://www.scopus.com/inward/record.url?scp=85190334275&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85190334275&partnerID=8YFLogxK
U2 - 10.1109/TPDS.2024.3387720
DO - 10.1109/TPDS.2024.3387720
M3 - Article
AN - SCOPUS:85190334275
SN - 1045-9219
VL - 35
SP - 901
EP - 918
JO - IEEE Transactions on Parallel and Distributed Systems
JF - IEEE Transactions on Parallel and Distributed Systems
IS - 6
ER -