High-End Systems
If one picture could tell the
story of NERSC’s high-end systems for FY 2002, perhaps
it would be Figure 1, which shows the dramatic increase in
usage of NERSC’s massively parallel (MPP) systems over
the past few years. The demand for time on high performance
computers always exceeds the supply, and users immediately
take advantage of any increase in capability. The NERSC Users
Group documented their future needs in The
DOE Greenbook—Needs and Directions in High-Performance
Computing for the Office of Science,
published in April 2002 (see sidebar).
Because of this high demand,
NERSC Center staff are always looking to the future. Even
as we upgraded and enhanced existing systems during the past
year, we took the next step toward doubling NERSC’s
capability, and we collaborated on a new strategy for developing
future computer architectures that are more useful for computational
science.
System Upgrades and Expansion
Additional memory was added to NERSC’s IBM SP system,
called Seaborg, which originally had 12 GB of memory
in all nodes. After the upgrade, 116 nodes had 16 GB of memory,
64 nodes had 32 GB, and 4 nodes had 64 GB. Fifty nodes were
also added to the compute pool. Seaborg’s connection
to the High Performance Storage System (HPSS) was upgraded
to Jumbo Gigabit Ethernet. The hardware upgrades were matched
by improvements in performance and utilization. In March–April
2002, Seaborg’s utilization (30 day moving average)
exceeded 96% for the first time. Similarly, CPU utilization
exceeded 92%. Given Seaborg’s improved stability, NERSC
was able to augment MPP allocations by about 6 million hours.
All of the NERSC Center’s systems achieved more than
98% availability in 2002.
|
|
|
|
Figure
1 Increasing NERSC MPP usage in processor
hours. |
|
After requesting proposals for a system to replace the Cray
T3E and SV1 systems, which were decommissioned in October
2002, NERSC’s procurement team, led by Deputy Director
Bill Kramer, reviewed proposals from a number of leading supercomputer
vendors. At the end of a careful deliberation and negotiation
process, the team decided to increase the capability of Seaborg.
In November, NERSC announced an agreement with IBM to double
the size of Seaborg, creating a machine with 6,656 processors
and a peak speed of 10 teraflop/s.
The expanded system will have 6,080 processors (380 nodes)
for computation. The total system will have 416 16-way Power
3+ nodes with each CPU at 1.5 Gflop/s. The system will include
7.8 terabytes (TB) of aggregate memory (the second-largest
memory on any open production system) and a Global Parallel
File System with 44 TB of storage. There will be an additional
15 TB of local system disk. Installation of the new equipment
began in November, with the full system expected to become
available to NERSC users by April 2003.
NERSC’s decision to double the size
of our existing IBM SP, a system with proven performance,
was made in order to have the most immediate and cost-effective
impact on the DOE research community, whose computing needs
are rising rapidly. The expanded computational capability
will be available in only a few months, offering users not
only a huge increase in processor hours but also an unprecedented
opportunity to explore the scalability of their applications.
While some codes are ready to scale immediately beyond 2048
processors, NERSC will start a scalability program to help
other users take advantage of the large number of processors.
NERSC will also change the queue structure to provide additional
encouragement for large jobs. The size of the expanded system
will enable innovative new science-of-scale applications as
well as help other projects meet their computational objectives
more quickly.
On the storage front, we upgraded our HPSS system to all
Fibre Channel disk with 15 TB of disk cache, increasing the
bandwidth by 80 to 100%, and we boosted the capacity of the
archive to 3.6 petabytes (PB). We enhanced security by turning
off all clear-text password access to storage systems. Data
transfer rates have accelerated from 40 MB/s to 80 MB/s, thanks
to the replacement of HIPPI with Jumbo Gigabit Ethernet and
the upgrade of HPSS and HSI software. NERSC currently transfers
2–3 TB of data to and from storage every day, and we
expect to have 1 PB in storage within the next year. Near-term
plans for HPSS include increasing archive capacity to 7 PB,
speeding up transfers to 2 GB/s, adding access to storage
using Grid credentials, and providing a Web interface to storage.
NERSC continues to contribute to HPSS development, and helped
the ASCI (Advanced Simulation and Computing Initiative) program
at Los Alamos National Laboratory to convert from CFS to HPSS
storage.
The Escher visualization server, which enables NERSC users
to perform visualizations from remote locations, has undergone
major improvements to enable faster visualization of large-scale
datasets. The server itself was upgraded to a Silicon Graphics
Onyx 3400 with twelve 600 MHz processors, 24 GB of memory,
two Infinite Reality 4 graphics pipes with 1 GB of texture
memory each, and a 4 TB RAID-5 disk array with ten 2 GB controllers.
Escher now has three Gigabit Ethernet interfaces: one connected
to the production network, and two working together connected
to the Jumbo Gig-E network to share data between machines,
including Seaborg and HPSS. New software on Escher includes
Globus, which enables Grid-based remote distributed visualization;
VMD for molecular dynamics visualizations; and ParaView, a
parallel visualization application for large datasets.
|