Better than $1/Mflops sustained: A scalable PC-based parallel computer for lattice QCD

Zoltán Fodor, Sándor D. Katz, Gábor Papp

Research output: Contribution to journalArticlepeer-review

23 Scopus citations


We study the feasibility of a PC-based parallel computer for medium to large scale lattice QCD simulations. The Eötvös Univ., Inst. Theor. Phys. cluster consists of 137 Intel P4-1.7GHz nodes with 512 MB RDRAM. The 32-bit, single precision sustained performance for dynamical QCD without communication is 1510 Mflops/node with Wilson and 970 Mflops/node with staggered fermions. This gives a total performance of 208 Gflops for Wilson and 133 Gflops for staggered QCD, respectively (for 64-bit applications the performance is approximately halved). The novel feature of our system is its communication architecture. In order to have a scalable, cost-effective machine we use Gigabit Ethernet cards for nearest-neighbor communications in a two-dimensional mesh. This type of communication is cost effective (only 30% of the hardware costs is spent on the communication). According to our benchmark measurements this type of communication results in around 40% communication time fraction for lattices upto 483 · 96 in full QCD simulations. The price/sustained-performance ratio for full QCD is better than $1/Mflops for Wilson (and around $1.5/Mflops for staggered) quarks for practically any lattice size, which can fit in our parallel computer. The communication software is freely available upon request for non-profit organizations.

Original languageEnglish (US)
Pages (from-to)121-134
Number of pages14
JournalComputer Physics Communications
Issue number2
StatePublished - May 1 2003

All Science Journal Classification (ASJC) codes

  • Hardware and Architecture
  • General Physics and Astronomy


Dive into the research topics of 'Better than $1/Mflops sustained: A scalable PC-based parallel computer for lattice QCD'. Together they form a unique fingerprint.

Cite this