NERSC Center banner
 

High-End Systems

If one picture could tell the story of NERSC’s high-end systems for FY 2002, perhaps it would be Figure 1, which shows the dramatic increase in usage of NERSC’s massively parallel (MPP) systems over the past few years. The demand for time on high performance computers always exceeds the supply, and users immediately take advantage of any increase in capability. The NERSC Users Group documented their future needs in The DOE Greenbook—Needs and Directions in High-Performance Computing for the Office of Science, published in April 2002 (see sidebar).

Because of this high demand, NERSC Center staff are always looking to the future. Even as we upgraded and enhanced existing systems during the past year, we took the next step toward doubling NERSC’s capability, and we collaborated on a new strategy for developing future computer architectures that are more useful for computational science.


System Upgrades and Expansion

Additional memory was added to NERSC’s IBM SP system, called Seaborg, which originally had 12 GB of memory in all nodes. After the upgrade, 116 nodes had 16 GB of memory, 64 nodes had 32 GB, and 4 nodes had 64 GB. Fifty nodes were also added to the compute pool. Seaborg’s connection to the High Performance Storage System (HPSS) was upgraded to Jumbo Gigabit Ethernet. The hardware upgrades were matched by improvements in performance and utilization. In March–April 2002, Seaborg’s utilization (30 day moving average) exceeded 96% for the first time. Similarly, CPU utilization exceeded 92%. Given Seaborg’s improved stability, NERSC was able to augment MPP allocations by about 6 million hours. All of the NERSC Center’s systems achieved more than 98% availability in 2002.

Figure 1   Increasing NERSC MPP usage in processor hours.

After requesting proposals for a system to replace the Cray T3E and SV1 systems, which were decommissioned in October 2002, NERSC’s procurement team, led by Deputy Director Bill Kramer, reviewed proposals from a number of leading supercomputer vendors. At the end of a careful deliberation and negotiation process, the team decided to increase the capability of Seaborg. In November, NERSC announced an agreement with IBM to double the size of Seaborg, creating a machine with 6,656 processors and a peak speed of 10 teraflop/s.

The expanded system will have 6,080 processors (380 nodes) for computation. The total system will have 416 16-way Power 3+ nodes with each CPU at 1.5 Gflop/s. The system will include 7.8 terabytes (TB) of aggregate memory (the second-largest memory on any open production system) and a Global Parallel File System with 44 TB of storage. There will be an additional 15 TB of local system disk. Installation of the new equipment began in November, with the full system expected to become available to NERSC users by April 2003.

NERSC’s decision to double the size of our existing IBM SP, a system with proven performance, was made in order to have the most immediate and cost-effective impact on the DOE research community, whose computing needs are rising rapidly. The expanded computational capability will be available in only a few months, offering users not only a huge increase in processor hours but also an unprecedented opportunity to explore the scalability of their applications. While some codes are ready to scale immediately beyond 2048 processors, NERSC will start a scalability program to help other users take advantage of the large number of processors. NERSC will also change the queue structure to provide additional encouragement for large jobs. The size of the expanded system will enable innovative new science-of-scale applications as well as help other projects meet their computational objectives more quickly.

On the storage front, we upgraded our HPSS system to all Fibre Channel disk with 15 TB of disk cache, increasing the bandwidth by 80 to 100%, and we boosted the capacity of the archive to 3.6 petabytes (PB). We enhanced security by turning off all clear-text password access to storage systems. Data transfer rates have accelerated from 40 MB/s to 80 MB/s, thanks to the replacement of HIPPI with Jumbo Gigabit Ethernet and the upgrade of HPSS and HSI software. NERSC currently transfers 2–3 TB of data to and from storage every day, and we expect to have 1 PB in storage within the next year. Near-term plans for HPSS include increasing archive capacity to 7 PB, speeding up transfers to 2 GB/s, adding access to storage using Grid credentials, and providing a Web interface to storage. NERSC continues to contribute to HPSS development, and helped the ASCI (Advanced Simulation and Computing Initiative) program at Los Alamos National Laboratory to convert from CFS to HPSS storage.

The Escher visualization server, which enables NERSC users to perform visualizations from remote locations, has undergone major improvements to enable faster visualization of large-scale datasets. The server itself was upgraded to a Silicon Graphics Onyx 3400 with twelve 600 MHz processors, 24 GB of memory, two Infinite Reality 4 graphics pipes with 1 GB of texture memory each, and a 4 TB RAID-5 disk array with ten 2 GB controllers. Escher now has three Gigabit Ethernet interfaces: one connected to the production network, and two working together connected to the Jumbo Gig-E network to share data between machines, including Seaborg and HPSS. New software on Escher includes Globus, which enables Grid-based remote distributed visualization; VMD for molecular dynamics visualizations; and ParaView, a parallel visualization application for large datasets.

 
NERSC Annual Report 2002 Table of Contents Science Highlights NERSC Center