Capability and Scalability

The past year was marked by a dramatic increase in NERSC’s computational capability, resulting in an impressive upsurge in the productivity of many research projects. But in traditional NERSC fashion, these changes took place almost seamlessly, with little disruption to users’ daily work.

In 2002, after requesting proposals to replace NERSC’s soon-to-be decommissioned Cray systems, the procurement team, lead by Deputy Director Bill Kramer, made the decision to increase the capability of the existing IBM SP Power 3 system, rather than purchase an entirely new one. This expansion gave the user community an unexpected increase of 30 million MPP hours in FY 2003, and doubled the hours that were allocated for FY 2004. “The bottom line was choosing a system that would have the most cost-effective impact on DOE science,” Kramer said when the decision was announced. The new contract includes five years of support for the combined system, named “Seaborg.”

The new Seaborg has 416 16-CPU Power 3+ SMP nodes (6,656 processors) with a peak performance of 10 teraflop/s, which made it the most powerful computer for unclassified research in the United States, with the largest memory capacity, at the time of its installation. The system has 7.8 terabytes of aggregate memory and a Global Parallel File System with 44 terabytes of storage.

A team of NERSC staff (Figure 1) worked for six months to configure, install, and test the system. The system was available more than 98 percent of the time during testing, and benchmarks ran at 72 percent of peak speed, much higher than that achieved on similar parallel systems. The team resolved major configuration issues, such as whether to operate Seaborg as one or two systems (they decided on one), and brought it into production status on March 4, 2003, one month earlier than planned. The expanded system quickly achieved the high availability and utilization rates that NERSC demands of itself (Figure 2).

Figure 1
The Seaborg expansion procurement and installation teams doubled NERSC’s computing resources 20 months earlier than expected and for 75% of the cost expected, without interrupting service to the research community. [click for larger image]

Figure 2
Processor utilization on Seaborg over the course of FY 2003. NERSC strives to maintain a high rate of processor utilization, balancing this goal with good job turnaround times.

Initial results from the expanded Seaborg showed some scientific applications running at up to 68% of the system’s theoretical peak speed, compared with the 5–10% of peak performance typical for scientific applications running on massively parallel or cluster architectures. Performance results for four science-of-scale applications are summarized in Table 1. (More details are available.)

Project

Number of processors

Performance (% of peak)

Electromagnetic Wave-Plasma Interactions

1,936

68%

Cosmic Microwave Background Data Analysis

4,096

50%

Terascale Simulations of Supernovae

2,048

43%

Quantum Chromodynamics at High Temperatures

1,024

13%


Table 1

Performance results for four science-of-scale applications.

 

To learn about how Seaborg performs as a production environment for a larger sample of scientific applications, NERSC conducted a survey of user programs that run on a large number of nodes, as well as programs that run on a smaller number of nodes but account for a significant fraction of time used on the system. The report “IBM SP Parallel Scaling Overview” discusses issues such as constraints to scaling, SMP scaling, MPI scaling, and parallel I/O scaling. The report offers recommendations for choosing the level of parallelism best suited to the characteristics of the code.

Improving the scalability of users’ codes helps NERSC achieve goals set by the DOE Office of Science: that 25% of computing time in FY 2003 and 50% in FY 2004 be used by jobs that require 512 or more processors. Job sizes for FY 2003 are shown in Figure 3.

Figure 3
Job sizes on Seaborg over the course of FY 2003. DOE’s goal for NERSC is to increase the percentage of jobs that use 512 or more processors.

The scaling report notes that as Seaborg doubled in size, the number of jobs increased correspondingly, and over time the number of jobs then decreased as the parallelism of jobs increased (Figure 4). The waiting time for large-concurrency jobs also changed in accordance with changes in queueing policies (Figure 5).

Figure 4
Number and size of jobs run on Seaborg in May 2002 (top) and May 2003 (bottom). The horizontal axis is time. The colored rectangles are jobs, with start and stop time depicted by width, and level of parallelism (number of nodes) depicted by height. Colors signify different users. The rectangles are arranged vertically to avoid overlapping.

Figure 5
Queue waiting time for jobs run on Seaborg in May 2002 (top) and May 2003 (bottom). Jobs are depicted as in Figure 3 except for colors. Here the color range signifies waiting time, with blue depicting a shorter wait and red a longer wait.

Another measure of Seaborg’s performance is Sustained System Performance (SSP). The SSP metric was developed at NERSC as a composite performance measurement of codes from five scientific applications in the NERSC workload: fusion energy, material sciences, cosmology, climate modeling, and quantum chromodynamics. Thus, SSP encompasses a range of algorithms and computational techniques and manages to quantify system performance in a way that is relevant to scientific computing.

To obtain the SSP metric, a floating-point operation count is obtained from microprocessor hardware counters for the five applications. A computational rate per processor is calculated by dividing the floating-point operation count by the sum of observed elapsed times of the applications. Finally, the SSP is calculated by multiplying the computational rate by the number of processors in the system.

The SSP is used as the primary performance figure for NERSC’s contract with IBM. It is periodically reevaluated to track performance and monitor changes in the system. For example, the performance impact of a compiler upgrade is readily detected with a change in SSP. The expansion of Seaborg more than doubled the SPP, from 657 to 1357.

During the acceptance period for the Seaborg expansion, there were reports of new nodes performing slightly better than old nodes, which was puzzling since the hardware is identical. All of these reports involved parallel MPI codes; no performance differences were detected on serial codes. The performance differences appeared to depend on the concurrency and the amount of synchronization in the MPI calls used in the code, but periodic observations and testing provided inconsistent results, and at times no difference could be measured. So in October 2003, groups of old and new nodes were set aside for in-depth testing.

A critical insight occurred when the control workstation happened to be inoperable during the node testing. During that time, the performance of the old nodes improved to match the new nodes. NERSC staff discovered that the control workstation ran certain commands from its problem management subsystem up to 27 times more often on old nodes than on new nodes. Once the problem was pinpointed, it was quickly resolved by deleting four deactivated problem definitions. Resolving this issue improved the synchronization of MPI codes at high concurrency, resulting in faster and more consistent run times for most NERSC users.

In addition to the Seaborg expansion, NERSC also upgraded its PDSF Linux cluster and High Performance Storage System (HPSS).

The PDSF, which supports large data capacity serial applications for the high energy and nuclear physics communities, was expanded to 412 nodes with a disk capacity of 135 terabytes (TB). New technologies incorporated this year include Opteron chips and 10 TB of configurable network attached storage (NAS).

At NERSC, the data in storage doubles almost every year (Figure 6). As of 2003, NERSC stores approximately 850 TB of data (30 million files) and handles between 3 and 6 TB of I/O per day (Figure 7). To keep up with this constantly increasing workload, new technology is implemented as it becomes available. NERSC currently has two HPSS systems, one used primarily for user files and another used primarily for system backups. The maximum theoretical capacity of NERSC’s archive is 8.8 petabytes, the buffer (disk) cache is 35 TB, and the theoretical transfer rate is 2.8 gigabytes per second. Implementation of a four-way stripe this year increased client data transfers to 800 megabytes per second.

Figure 6
Cumulative storage by month, 1998–2003.

Figure 7
I/O by month, 1998–2003.

 

Clients, Sponsors, and Advisors
Capability and Scalability
Comprehensive Scientific Support
Connectivity
Advanced Development

Top