skip to: onlinetools | mainnavigation | content | footer

ASC @ Sandia

ASC Logo

Contacts

ASC Program Director
James Peery
(jspeery@sandia.gov
)
(505) 845-9490

ASC Communications
Reeta Garber
(ragarbe@sandia.gov
)
(505) 844-3900

Related Links
ASC Management

ASC Focus Areas

Software Quality Engineering (SQE)

CPT and Milestones

UQ V&V Seminar Series

Introducing Red Storm

Red Storm is a massively parallel processing (MPP) supercomputer at Sandia National Laboratories/New Mexico that recently joined the ranks of the Advanced Simulation & Computing (ASC) supercomputers monopolizing world computing records. Red Storm was uniquely designed by Sandia and Cray, Inc., to address the highly complex nuclear weapons stockpile computing problems that particularly characterize the simulations required by an engineering laboratory such as Sandia. Red Storm allows modeling and simulation of complex problems in nuclear weapons stockpile stewardship that were only recently thought impractical, if not impossible.

ASC researchers at Los Alamos and Lawrence Livermore are also finding it a valuable resource.

Red Storm is partitioned to support classified and unclassified operations. Its high-performance Input/Output system facilitates connecting with external Sandia and tri-lab networks and storage. Red Storm is scalable from one cabinet to tens of thousands of processors. The architecture is scalable to greater than 100 teraOPS.

Sandia’s collaboration with Cray supports commercialization of the technology. This not only increases national competitiveness in supercomputing, but results in a wide user knowledge base to detect and fix errors and problems.

What Features Make Red Storm Unique?

The ASC computing resources offer different advantages for different kind of applications. Red Storm specializes in MPP problems that require considerable interaction and coordination of the high-volume AMD processors running an application.

Red Storm was constructed of commercial off-the-shelf parts supporting the custom IBM-manufactured SeaStar interconnect chip. The interconnect chips, one of which accompanies each of 12,960 AMD Opteron™ compute nodes, make it possible for Red Storm’s processors to pass data to one another efficiently while applications are running. The interconnect is also key to the three-dimensional mesh that allows 3-D representations of complex problems. Red Storm holds a world record in visualization and two of the High Performance Computing Challenge (HPCC) benchmarks, PTRANS (optimized run) and RandomAccess (baseline run).

A flat architecture is another unique feature. A simple design means that information can pass more directly from processor to processor without having to pass through many levels of processors in a complex hierarchy. The Catamount operating software runs the application with a user-friendly Linux system serving as the user interface.

How Does Red Storm Excel?

Red Storm excels in balance and scalability. The machine was designed so that the relative performance (balance) of processor speed, memory access speed, and network speed would provide a reasonable match for application needs. The higher memory bandwidth keeps the processors from being starved for data and makes the Central Processing Unit more efficient. The total communication capacity allows data to flow freely between the 12,920 processors without bottlenecks or congestion.

The Light Weight Operating System, Catamount, facilitates application scalability. The OS provides only the necessary features to support the application. It then turns the processor over to the application almost exclusively with minimal OS overhead. This is an important scalability feature as random OS noise across the thousands of processors impedes the progress of the entire application.

What are Red Storm’s Applications?

Red Storm’s primary use is in U.S. nuclear stockpile work: designing new replacement components, virtual testing of components under hostile, abnormal, and normal conditions, and assisting in weapons engineering and weapons physics.

Specifications 

http://redstormweb.sandia.gov/

Operational Time Frame 2005 2006 2008
Theoretical Peak (TF) 41.47 124.42 284.16
HPL Performance (GF) 36190 on 10,880 processors 101400 on 26,544 processors 204200 on 38,208 processors
Architecture distr memory MIMD distr memory MIMD distr memory MIMD
Number of compute processors 10,368 25,920 (12,960 nodes) 38,400 (12,960 nodes)
Number of service/IO processors 256 + 256 320+320 320+320
Processor AMD Opteron™ @ 2.0 GHz AMD dual core Opteron™ @ 2.4 GHz

6720 AMD dual-core Opteron™ @ 2.4 GHz

6240 AMD quad-core Opteron™ @ 2.2 GHz

Total Memory 33.38TB 39.19 TB 78.75 TB
System Memory B/W 57.97 TB/s 78.12 TB/s 126.29 TB/s
User Disk Storage 340 TB 340 TB 1753 TB
Parallel File System B/W target is 50.0 GB/s each color target is 50.0 GB/s each color target is 50.0 GB/s each color
External Network B/W 25 GB/s each color 25 GB/s each color 25 GB/s each color
Interconnect Topology 3-D Mesh (x,y,z) 3-D Mesh (x,y,z) 3-D Mesh (x,y,z)
  27 x 16 x 24 27 x 20 x 24 27 x 20 x 24
Interconnect Performance
MPI Latency 6.6 µs 1 hop, 9.6 µs max 4.8 µs 1 hop, 7.8 µs max 4.9 µs 1 hop, 8 µs max
Bi-Directional Link B/W 9.6 GB/s 9.6 GB/s 9.6 GB/s
Minimum Bi-section B/W 3.69 TB/s 4.61 TB/s 4.61 TB/s
Full System RAS      
RAS Network 100 Mb Ethernet 100 Mb Ethernet 100 Mb Ethernet
RAS Processors 1 for each 4 CPUs 1 for each 4 CPUs 1 for each 4 CPUs
Operating System
Compute Nodes Catamount (based on cougar) Catamount (based on cougar) Catamount N-Way (CNW)
Service and I/O Nodes Linux Linux Linux
RAS Nodes Linux Linux Linux
Red Black Switching 2688 - 4992 - 2688 3360 - 6240 - 3360 3360 - 6240 - 3360
System Foot Print ~3000 sq ft ~3500 sq ft ~3500 sq ft
Power Requirement 1.7 MW <2.5 MW 2.5 MW

 

 

Feedback