Benchmark 1: Apoa1 benchmark from the NAMD suite. 500 steps, 92K
atoms, 12A cutoff + PME every 4 steps.
(Jan 2009).
All parallel jobs on the Biowulf cluster should run at least 70% efficiency, to ensure maximum utilization of the cluster resources. Based on this set of benchmarks, the apoa1 and similar jobs should be submitted to about 8 p2800, o2200 or o2800 nodes (16 processors), or up to 16 Infiniband nodes (32 processors). Other types of jobs may scale differently; see the Biowulf NAMD page for examples.
To find the most appropriate number of nodes for a specific type of job, it is essential to run one's own benchmarks.
# processors | Wallclock time in seconds (Efficiency ) | |||
p2800 gige 2.8 GHz Xeon Gigabit Ethernet Intel compiler |
o2200 gige 2.2 GHz Opteron Gigabit Ethernet Intel compiler |
o2800 gige 2.8 GHz Opteron Gigabit Ethernet Intel compiler |
o2800 ib 2.8 GHz Opteron Infiniband Pathscale compiler |
|
1 | 1970 (100) | 1631 (100) | 1163 (100) | 1125 (100) |
2 | 1047 (94) | 844 (97) | 612 (95) | 575 (98) |
4 | 547 (90) | 447 (91) | 322 (90) | 298 (94) |
6 | 378 (87) | 313 (87) | 234 (83) | 199 (94) |
8 | 300 (82) | 249 (82) | 177 (82) | 150 (94) |
10 | 253 (78) | 211 (77) | 189 (61) | 130 (87) |
12 | 204 (80) | 169 (80) | 129 (75) | 102 (92) |
14 | 193 (73) | 158 (74) | 129 (64) | 95 (85) |
16 | 178 (69) | 145 (70) | 120 (61) | 87 (81) |
18 | 140 (78) | 116 (78) | 87 (74) | 71 (88) |
20 | 134 (74) | 119 (69) | 84 (69) | 67 (83) |
24 | 118 (70) | 106 (64) | 79 (61) | 54 (88) |
28 | 103 (69) | 86 (68) | 65 (64) | 50 (81) |
32 | 98 (63) | 87 (58) | 62 (62) | 47 (75) |
Benchmark 2: Water Sphere simulation, courtesy Jeff Forbes, NIAMS. (August 2006)
Based on these benchmarks, to obtain at least 70% efficiency, this job could be run on about 24 processors (12 nodes) on p2800, o2200, or o2800 nodes. The efficiency drops much more slowly on the Infiniband nodes, so the job could use up to 40 or 50 processors (20-25 nodes) on the Infiniband nodes.
# processors | Wallclock time in seconds (Efficiency) | |||
p2800 gige 2.8 GHz Xeon Gigabit Ethernet prebuilt 32-bit binaries |
o2200 gige 2.2 GHz Opteron Gigabit Ethernet prebuilt 64-bit binaries |
o2800 gige 2.8 GHz Opteron Gigabit Ethernet prebuilt 64-bit binaries |
o2800 ib 2.8 GHz Opteron Infiniband Pathscale compilers |
|
1 | 7011 (100) | 5207 (100) | 3355 (100) | 3117 (100) |
2 | 3590 (98) | 2659 (98) | 1754 (6) | 1593 (98) |
4 | 1838 (95) | 1377 (95) | 924 (91) | 816 (96) |
6 | 1342 (87) | 1045 (83) | 649 (86) | 589 (88) |
8 | 991 (88) | 775 (84) | 490 (86) | 424 (92) |
10 | 799 (88) | 613 (85) | 402 (83) | 343 (91) |
12 | 713 (82) | 531 (82) | 336 83) | 282 (92) |
14 | 578 (87) | 456 (82) | 292 (82) | 243 (92) |
16 | 525 (83) | 399 (82) | 264 (79) | 214 (91) |
20 | 433 (81) | 343 (76) | 218 (77) | 181 (86) |
24 | 375 (78) | 290 (75) | 186 (75) | 151 (86) |
28 | 321 (78) | 273 (68) | 167 (72) | 129 (86) |
32 | 295 (74) | 265 (61) | 147 (71) | 116 (84) |
36 | 255 (76) | 248 (58) | 139 (67) | 103 (84) |
40 | 243 (72) | 238 (55) | 129 (65) | 93 (84) |
Water sphere and water box simulations courtesy Jeff Forbes, NIAMS
In the two jobs above, note that the water sphere simulation scales well to about 36 processors. Beyond that, the efficiency falls to below 60% which is generally considered poor efficiency. The water box simulation scales to about 16 processors before the efficiency drops to below 60%. This is similar to the apoa1 example above. (We now recommend that jobs run at least at 70% efficiency)
Equilibrated model of integral membrance complex, courtesy Nara Dashdorj, LCP, NIDDK.
A total of 346,358 atoms including water, lipids, and protein with several prosthetic groups. Cutoff 12.0, fullElectFrequency 4, nonbondedFreq 2, stepspercycle 20. In these benchmarks, the job scales to about 60 processors (30 nodes) on the p2800 Xeons, and to about 50 processors (25 nodes) on the o2200 Opterons. This is typical behaviour; with higher processor speeds the communications become more of a bottleneck, and so the job does not scale as well on the faster processors. (We now recommend that jobs run at least at 70% efficiency)