Platforms
and System Software
Platforms
System
Software
The History of Platforms at Sandia:
In the mid-eighties, Sandia began to take parallel computing seriously and in 1987 formed the Massively Parallel Computing Research Lab
(MPCRL). In addition to pioneering much of the system software, many algorithmic advances, and computational methods for massively
parallel computing, the MPCRL fielded the first large-scale massively parallel computers and demonstrated their potential for technical computing.
In 1987, we fielded the first 1000+ processor computer, the nCUBE-10, a 1024-processor
system based on a 10-dimensional hypercube network. With the nCUBE-10, we were the first to get over 100-fold and 1000-fold speedup (measured against the speed of a single processor of the
same machine) on real scientific and technical applications. This work led to the only Karp Challenge ever awarded (for the first achievement of more than 100-fold speedup), to the first Gordon
Bell Award, (for three applications that achieved speedups of over 1000 on a 1024-processor system), to R&D-100 Awards and to patents for MPP computing techniques.
In 1988, we fielded a 16384-way
CM-2 from Thinking Machines Corporation that was later upgraded to a CM-200.
This was a SIMD (single instruction, multiple data stream) system and we
evaluated its performance against MIMD (multiple instructions, multiple data
stream) systems like those we had from the NCUBE Corporation. The CM-2 had
16384 1-bit processing elements and 512 32-bit floating point units (32 single-bit
elements were associated with each floating
point unit).
In 1990, we fielded two 1024-processor nCUBE-2 supercomputers. The nCUBE also had a 10-dimensional hypercube network. It was the
first MPP competitive with the parallel vector machines from Cray Research. It had a peak floating point capability of 2 Gigaflops.
We achieved a large fraction of that peak speed on many real applications. Its outstanding performance was enabled by its highly balanced
design (fast network) and by its highly efficient system software. This platform enabled us to develop our first light-weight kernel (LWK) operating system, SUNMOS.
In 1992, we fielded a 64-processor Intel IPSC-860. This machine was an interesting research engine and prepared the way for a partnership
with Intel that lasted until we jointly fielded the world's first terascale system in 1996.
In 1993, we fielded the
3800+ processor Intel Paragon with a peak speed in excess of 100 Gigaflops.
It was the first MPP to be indisputably the fastest computer in the world.
The operating system, OSF-1, supplied by Intel for the Paragon failed to
scale well: the OS buffers took up the entire memory on the system for large
numbers
of processors and OS overhead increased to huge levels as the machine grew. Intel
eventually fixed these problems with the Paragon OS. However, we didn't wait.
Within four months of installing the Paragon at Sandia we had ported our
LWK, SUNMOS, to the Paragon and it and associated runtime software became
the basis of operations
on that machine. At the same time we began to develop a second-generation LWK
called PUMA which eventually replaced SUNMOS and which Intel and Sandia would
later use as the basis for Cougar, the LWK that powered TFlops (also know
as ASCI RED) the first machine to exceed a teraflops on Linpack and later
on real applications.
In 1996, Intel and Sandia fielded a 9300+ processor MPP at Sandia that had a peak floating point rating of over 1.8 teraflops (TF).
This machine achieved over 1 TF on Linpack as part of its acceptance process. It later was upgraded to 3.1 TF pak rating and its memory was doubled to 1.2 terabytes (TB). It was the fastest machine in
the world from early 1997 into 2002. It was also one of the most reliable machines ever built, based in part on reliability, availability,
and serviceability (RAS) being built into every feature of the design and in part on its Sandia-provided partitioned operating environment with most of the nodes running Sandia's third-generation,
minimalist LWK operating system, Cougar. As of 2003, RED is still in full production at Sandia.
In 2002, Sandia and Cray, Inc. entered into a contract to develop and field RED STORM, a Sandia-architected, Cray-engineered MPP
with over 10,000 processors. RED STORM will provide a highly, balanced, cost-effective and reliable MPP by drawing on the heritage of Sandia's
nCUBE's, Paragon, RED, and Cray's T3D and T3E systems. The RED STORM architecture is designed to scale to 30,000+ processors and
up to a Petaflops in later versions. The initial system will have a rated peak in the 50-TF range. It will be ready for use in late 2004.
Newsnotes | Info
and Events (internal - SNL only) | Open-Source
Software Downloads | Privacy
and Security
Sandia
National Laboratories Home Page - External or Internal
(SNL only)
|