Gromacs benchmarks

GROMACS v 3.3.3
May 2008

Benchmark 1: 1024 DPPC lipids with 23 water molecules per lipid, totalling to 121856 atoms. A twin-range group based cut-off is used, 1.8 nm for electrostatics and 1.0 nm for Lennard-Jones interactions. The long-range contribution to electrostatics is updated every 10 steps. 5000 steps = 10ps.
As others have observed, this is a particularly scalable benchmark.

dppc_3

# processors Wallclock time in seconds (Efficiency %)
o2200,gige
(2.2 GHz Opteron
1 Gb/s ethernet )
o2800, ib
(2.8 GHz Opteron
2 Gb/s Infiniband)
o2800,gige
(2.2 GHz Opteron
1Gb/s ethernet)
1 4542 (100) 2673 (100) 3358 (100)
21922 (118) 1132 (118) 1499 (112)
4 1416 (80) 581 (115) 1056 (79)
8
306 (109) 864 (49)
16
174 (96)
32
118 (71)
64
163 (26)

The o2800 ib (2.8 GHz Opterons, 10 Gb/s Infiniband) runs were performed with a 64-bit build of Gromacs with Pathscale compilers. All other runs are with 32-bit builds of Gromacs with gcc and mpich.
                100 * Time on 1 processor
Efficiency =    ---------------------------    
                  n * Time on n processors
Note the consistent superscaling, where a 2-processor job runs more than x times as fast as a 1-processor job on the same type of node, and therefore the efficiency is > 100%. This has been observed by other groups. "GROMACS has a communications intensive benchmark that can experience superlinear performance. When partitioned across multiple nodes, a larger portion of the simulation data can reside in L2 cache, reducing the amount of main memory accesses.".

Bottom line:

The same benchmark, reported in terms of ns/day and speedup.

# processors ns/day (Speedup)
o2200,gige
(2.2 GHz Opteron
1 Gb/s ethernet )
o2800, ib
(2.8 GHz Opteron
2 Gb/s Infiniband)
o2800,gige
(2.2 GHz Opteron
1Gb/s ethernet)
1 0.190 0.323 0.257
20.450 (2.36) 0.763 (2.35) 0.576 (2.24)
4 0.610 (3.2) 1.487 (4.6)0.818 (3.18)
8
2.824 (8.74)1.000 (3.89)
16
4.966 (15.37)
32
7.322 (22.67)
64
5.301 (16.4)

Benchmarks for Gromacs v 3.3.1 (July 2006)