Red Storm is a massively parallel processing (MPP) supercomputer at Sandia National Laboratories/New Mexico that recently joined the ranks of the Advanced Simulation & Computing (ASC) supercomputers monopolizing world computing records. Red Storm was uniquely designed by Sandia and Cray, Inc., to address the highly complex nuclear weapons stockpile computing problems that particularly characterize the simulations required by an engineering laboratory such as Sandia. Red Storm allows modeling and simulation of complex problems in nuclear weapons stockpile stewardship that were only recently thought impractical, if not impossible.
ASC researchers at Los Alamos and Lawrence Livermore are also finding it a valuable resource.
Red Storm is partitioned to support classified and unclassified operations. Its high-performance Input/Output system facilitates connecting with external Sandia and tri-lab networks and storage. Red Storm is scalable from one cabinet to tens of thousands of processors. The architecture is scalable to greater than 100 teraOPS.
Sandia’s collaboration with Cray supports commercialization of the technology. This not only increases national competitiveness in supercomputing, but results in a wide user knowledge base to detect and fix errors and problems.
The ASC computing resources offer different advantages for different kind of applications. Red Storm specializes in MPP problems that require considerable interaction and coordination of the high-volume AMD processors running an application.
Red Storm was constructed of commercial off-the-shelf parts supporting the custom IBM-manufactured SeaStar interconnect chip. The interconnect chips, one of which accompanies each of 12,960 AMD Opteron™ compute nodes, make it possible for Red Storm’s processors to pass data to one another efficiently while applications are running. The interconnect is also key to the three-dimensional mesh that allows 3-D representations of complex problems. Red Storm holds a world record in visualization and two of the High Performance Computing Challenge (HPCC) benchmarks, PTRANS (optimized run) and RandomAccess (baseline run).
A flat architecture is another unique feature. A simple design means that information can pass more directly from processor to processor without having to pass through many levels of processors in a complex hierarchy. The Catamount operating software runs the application with a user-friendly Linux system serving as the user interface.
Red Storm excels in balance and scalability. The machine was designed so that the relative performance (balance) of processor speed, memory access speed, and network speed would provide a reasonable match for application needs. The higher memory bandwidth keeps the processors from being starved for data and makes the Central Processing Unit more efficient. The total communication capacity allows data to flow freely between the 12,920 processors without bottlenecks or congestion.
The Light Weight Operating System, Catamount, facilitates application scalability. The OS provides only the necessary features to support the application. It then turns the processor over to the application almost exclusively with minimal OS overhead. This is an important scalability feature as random OS noise across the thousands of processors impedes the progress of the entire application.
Red Storm’s primary use is in U.S. nuclear stockpile work: designing new replacement components, virtual testing of components under hostile, abnormal, and normal conditions, and assisting in weapons engineering and weapons physics.
http://redstormweb.sandia.gov/
Operational Time Frame | 2005 | 2006 | 2008 |
---|---|---|---|
Theoretical Peak (TF) | 41.47 | 124.42 | 284.16 |
HPL Performance (GF) | 36190 on 10,880 processors | 101400 on 26,544 processors | 204200 on 38,208 processors |
Architecture | distr memory MIMD | distr memory MIMD | distr memory MIMD |
Number of compute processors | 10,368 | 25,920 (12,960 nodes) | 38,400 (12,960 nodes) |
Number of service/IO processors | 256 + 256 | 320+320 | 320+320 |
Processor | AMD Opteron™ @ 2.0 GHz | AMD dual core Opteron™ @ 2.4 GHz | 6720 AMD dual-core Opteron™ @ 2.4 GHz 6240 AMD quad-core Opteron™ @ 2.2 GHz |
Total Memory | 33.38TB | 39.19 TB | 78.75 TB |
System Memory B/W | 57.97 TB/s | 78.12 TB/s | 126.29 TB/s |
User Disk Storage | 340 TB | 340 TB | 1753 TB |
Parallel File System B/W | target is 50.0 GB/s each color | target is 50.0 GB/s each color | target is 50.0 GB/s each color |
External Network B/W | 25 GB/s each color | 25 GB/s each color | 25 GB/s each color |
Interconnect Topology | 3-D Mesh (x,y,z) | 3-D Mesh (x,y,z) | 3-D Mesh (x,y,z) |
27 x 16 x 24 | 27 x 20 x 24 | 27 x 20 x 24 | |
Interconnect Performance | |||
MPI Latency | 6.6 µs 1 hop, 9.6 µs max | 4.8 µs 1 hop, 7.8 µs max | 4.9 µs 1 hop, 8 µs max |
Bi-Directional Link B/W | 9.6 GB/s | 9.6 GB/s | 9.6 GB/s |
Minimum Bi-section B/W | 3.69 TB/s | 4.61 TB/s | 4.61 TB/s |
Full System RAS | |||
RAS Network | 100 Mb Ethernet | 100 Mb Ethernet | 100 Mb Ethernet |
RAS Processors | 1 for each 4 CPUs | 1 for each 4 CPUs | 1 for each 4 CPUs |
Operating System | |||
Compute Nodes | Catamount (based on cougar) | Catamount (based on cougar) | Catamount N-Way (CNW) |
Service and I/O Nodes | Linux | Linux | Linux |
RAS Nodes | Linux | Linux | Linux |
Red Black Switching | 2688 - 4992 - 2688 | 3360 - 6240 - 3360 | 3360 - 6240 - 3360 |
System Foot Print | ~3000 sq ft | ~3500 sq ft | ~3500 sq ft |
Power Requirement | 1.7 MW | <2.5 MW | 2.5 MW |