Benchmark 1:
1024 DPPC lipids with 23 water molecules per lipid, totalling to
121856 atoms. A twin-range group based cut-off is used, 1.8 nm for electrostatics and 1.0 nm for Lennard-Jones interactions. The long-range
contribution to electrostatics is updated every 10 steps. 5000 steps = 10ps.
As others have observed, this is a particularly scalable benchmark.
# processors | Wallclock time in seconds (Efficiency %)
o2200,gige | (2.2 GHz Opteron 1 Gb/s ethernet ) o2800, ib | (2.8 GHz Opteron 2 Gb/s Infiniband) o2800,gige | (2.2 GHz Opteron 1Gb/s ethernet) 1 | 4542 (100) | 2673 (100) | 3358 (100)
| 2 | 1922 (118) | 1132 (118) | 1499 (112)
| 4 | 1416 (80) | 581 (115) | 1056 (79)
| 8 | 306 (109) | 864 (49)
| 16 | 174 (96) | 32 | 118 (71) | 64 | 163 (26) | |
100 * Time on 1 processor Efficiency = --------------------------- n * Time on n processorsNote the consistent superscaling, where a 2-processor job runs more than x times as fast as a 1-processor job on the same type of node, and therefore the efficiency is > 100%. This has been observed by other groups. "GROMACS has a communications intensive benchmark that can experience superlinear performance. When partitioned across multiple nodes, a larger portion of the simulation data can reside in L2 cache, reducing the amount of main memory accesses.".
Bottom line:
The same benchmark, reported in terms of ns/day and speedup.
# processors | ns/day (Speedup)
o2200,gige | (2.2 GHz Opteron 1 Gb/s ethernet ) o2800, ib | (2.8 GHz Opteron 2 Gb/s Infiniband) o2800,gige | (2.2 GHz Opteron 1Gb/s ethernet) 1 | 0.190 | 0.323 | 0.257
| 2 | 0.450 (2.36) | 0.763 (2.35) | 0.576 (2.24)
| 4 | 0.610 (3.2) | 1.487 (4.6) | 0.818 (3.18)
| 8 | 2.824 (8.74) | 1.000 (3.89)
| 16 | 4.966 (15.37) | 32 | 7.322 (22.67) | 64 | 5.301 (16.4) | |
Benchmarks for Gromacs v 3.3.1 (July 2006)