TY - JOUR
T1 - Better than $1/Mflops sustained
T2 - A scalable PC-based parallel computer for lattice QCD
AU - Fodor, Zoltán
AU - Katz, Sándor D.
AU - Papp, Gábor
N1 - Funding Information:
We thank F. Csikor and Z. Horváth for their continuous help. We thank them and J. Kuti, Th. Lippert, I. Montvay, K. Schilling, H. Simma and R. Tripiccione for suggestions and careful reading of the manuscript. The benchmark tests were done by modifying the MILC Collaboration’s public code (see http://physics.indiana. edu/~sg/milc.html). This work was supported by Hungarian Science Foundation Grants under Contract Nos. OTKAT37615T34980/T29803/T22929M37071/OM-MU-708/IKTA111/FKFP220/00.
Copyright:
Copyright 2004 Elsevier Science B.V., Amsterdam. All rights reserved.
PY - 2003/5/1
Y1 - 2003/5/1
N2 - We study the feasibility of a PC-based parallel computer for medium to large scale lattice QCD simulations. The Eötvös Univ., Inst. Theor. Phys. cluster consists of 137 Intel P4-1.7GHz nodes with 512 MB RDRAM. The 32-bit, single precision sustained performance for dynamical QCD without communication is 1510 Mflops/node with Wilson and 970 Mflops/node with staggered fermions. This gives a total performance of 208 Gflops for Wilson and 133 Gflops for staggered QCD, respectively (for 64-bit applications the performance is approximately halved). The novel feature of our system is its communication architecture. In order to have a scalable, cost-effective machine we use Gigabit Ethernet cards for nearest-neighbor communications in a two-dimensional mesh. This type of communication is cost effective (only 30% of the hardware costs is spent on the communication). According to our benchmark measurements this type of communication results in around 40% communication time fraction for lattices upto 483 · 96 in full QCD simulations. The price/sustained-performance ratio for full QCD is better than $1/Mflops for Wilson (and around $1.5/Mflops for staggered) quarks for practically any lattice size, which can fit in our parallel computer. The communication software is freely available upon request for non-profit organizations.
AB - We study the feasibility of a PC-based parallel computer for medium to large scale lattice QCD simulations. The Eötvös Univ., Inst. Theor. Phys. cluster consists of 137 Intel P4-1.7GHz nodes with 512 MB RDRAM. The 32-bit, single precision sustained performance for dynamical QCD without communication is 1510 Mflops/node with Wilson and 970 Mflops/node with staggered fermions. This gives a total performance of 208 Gflops for Wilson and 133 Gflops for staggered QCD, respectively (for 64-bit applications the performance is approximately halved). The novel feature of our system is its communication architecture. In order to have a scalable, cost-effective machine we use Gigabit Ethernet cards for nearest-neighbor communications in a two-dimensional mesh. This type of communication is cost effective (only 30% of the hardware costs is spent on the communication). According to our benchmark measurements this type of communication results in around 40% communication time fraction for lattices upto 483 · 96 in full QCD simulations. The price/sustained-performance ratio for full QCD is better than $1/Mflops for Wilson (and around $1.5/Mflops for staggered) quarks for practically any lattice size, which can fit in our parallel computer. The communication software is freely available upon request for non-profit organizations.
UR - http://www.scopus.com/inward/record.url?scp=0037402266&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=0037402266&partnerID=8YFLogxK
U2 - 10.1016/S0010-4655(02)00776-2
DO - 10.1016/S0010-4655(02)00776-2
M3 - Article
AN - SCOPUS:0037402266
SN - 0010-4655
VL - 152
SP - 121
EP - 134
JO - Computer Physics Communications
JF - Computer Physics Communications
IS - 2
ER -