Annual Report
2001
TABLE OF CONTENTS YEAR IN REVIEW SCIENCE HIGHLIGHTS
YEAR IN REVIEW

NERSC Systems and Services  
Director's
Perspective
 
Computational Science at NERSC
NERSC Systems and Services
High Performance Computing R&D at Berkeley Lab
Basic Energy Sciences
Biological and Environmental Research
Fusion Energy Sciences
High Energy and Nuclear Physics
Advanced Scientific Computing Research and Other Projects
Molecular dynamics simulation
Molecular dynamics simulation of DNA with a sodium counter-ion. See details.

The year 2001 was a landmark year for NERSC. It was our first year in Berkeley Lab's new Oakland Scientific Facility, and it was the year we advanced into the terascale computing era by installing what was at that time the most powerful computer in the world for unclassified research. During the year we also made significant upgrades in our mass storage and networking capabilities, and we continued making our services more responsive to client needs.


Oakland Facility Dedicated

On May 24, Oakland Mayor Jerry Brown and Dr. James Decker, Acting Director of the DOE Office of Science, joined Berkeley Lab Director Charles Shank for the formal dedication of the Oakland Scientific Facility (OSF). (NERSC actually began using the building in November 2000.) Decker noted that the computational research done at NERSC will pay substantial dividends to the city of Oakland, industry partners, the DOE, the Laboratory, and the nation. Mayor Brown praised the "boldness of creativity" evident in the new facility and pledged the city's support.

  Oakland Scientific Facility dedication
   
 
With the symbolic connection of cables linking NERSC's supercomputers with its international user community on a high-speed network, the Oakland Scientific Facility was formally dedicated by Oakland Mayor Jerry Brown, Berkeley Lab Director Charles Shank, and Dr. James Decker, Acting Director of the DOE Office of Science.

About 75 guests also heard from Peter Ungaro of IBM; James Payne of Qwest Communications; Peter Wang, Chairman and President of Encinal Broadway, the building's owner; and Jud King, provost and senior vice president for the University of California. Bill McCurdy, associate laboratory director for Computing Sciences, led the machine room tour that followed, pointing to the new IBM SP as "a tool that is our future, the way we will understand our world."

The new 16,000-square-foot OSF computer room was designed for flexibility and expandability. The computer room can be expanded to 20,000 square feet by removing a non-load-bearing wall, resulting in a facility able to accommodate a new generation of computing systems. The facility has a flexible power distribution system with a power supply of 5 MW, and a high-capacity chilled-water air conditioning system to maintain the ideal operating environment.

5 Teraflop/s IBM SP Goes Online

In August, after months of rigorous acceptance testing, Phase 2 of the NERSC-3 system was made available to the research community. The 3,328-processor IBM RS/6000 SP system, named "Seaborg" in honor of Berkeley Lab Nobel Laureate Glenn Seaborg, is capable of performing 5 trillion calculations per second (5 teraflop/s). Installed in January, the SP underwent extensive configuration and customization for the NERSC production environment and users. NERSC's Sustained System Performance (SSP) benchmark suite was used to test the system with actual scientific applications in a realistic production mode. By the time the system was opened up for general production use, it was running so well that significant scientific results in several fields were produced in just the first few weeks, and users expressed a high level of satisfaction with the system.

IBM SP Acceptance Team
 
The IBM SP acceptance team spent seven months testing, evaluating, and fine-tuning the system's performance. Led by NERSC Deputy Director Bill Kramer, key team members were Lynne Rippe, Gary Mack, Jim Craw, Nick Cardo, Tammy Welcome, Adrian Wong, David Bailey, Jonathon Carter, David Skinner, Dinesh Vakharia, Scott Burrow, and Ronald Mertes. Other contributors included Harsh Anand, Majdi Baddourah, Richard Beard, Del Black, Julian Borrill, Greg Butler, Andrew Canning, Eli Dart, Tom DeBoni, Brent Draney, Aaron Garrett, Richard Gerber, Frank Hale, William Harris, Russell Huie, Wayne Hurlbert, YuLok Lam, Steven Lowe, Nancy Meyer, Robert Neylan, Esmond Ng, David Paul, Jay Srinivasan, David Turner, and Alex Ubungen (not all present for photo).

 

Because the NERSC-3 contract was based on specific, firm performance metrics and functional requirements rather than on peak performance or hardware specifications, the Phase 2 IBM SP system has 22% more processing power and 50% more memory than originally planned, and thus will provide 10 million additional processor hours per year to users. Peak performance has grown from the planned 3.8 to 5 teraflop/s, with 3,008 processors available for computation. A unique aspect of NERSC-3 may be its 4.5 terabytes of main memory.

The NERSC-3 contract, defined in 1998-99 and based on IBM's proposal, specified more than 40 performance and 150 functional requirements that stem from the application workload that exists at NERSC. Almost every area of function and performance is covered by overlapping but independently relevant measures. For example, there are two disk I/O measures, one for large parallel files and another for small parallel and serial files. Both metrics are necessary for an accurate estimate of the true performance for scientists.

The fixed-price contract guaranteed a minimum configuration of hardware—what was projected to meet all the requirements—and IBM also guaranteed to meet the performance requirements. At the end of acceptance testing on the expanded system, IBM had exceeded the performance and functional requirements on the specifications. NERSC also purchased some additional hardware in order to make the system even more suitable for certain applications. As a result, the NERSC-3 system is capable of meeting all the maximum performance requirements, and provides significantly enhanced capability and performance for the typical application requirements.


Bigger, Faster, Easier Mass Storage

To stay ahead of the growing needs of our clients and the growing capacity of our computational systems, NERSC works constantly to increase the capacity, bandwidth, and functionality of our HPSS mass storage system. This year NERSC began increasing our disk cache to 20 terabytes by adding more Fibre Channel disks. Internal data transfer rates were improved by the replacement of our HIPPI internal storage network with Jumbo Gigabit Ethernet. Additional high-capacity Fibre tape drives are expanding our archive from 1.3 to 2 petabytes. And AFS was upgraded with a bigger disk cache and faster processors.

Two years ago, NERSC and Oak Ridge National Laboratory established the Probe wide-area distributed-storage testbed to research storage technologies. During the past year NERSC used Probe to test a new, faster version of HSI (the HPSS interface utility) that enables access to multiple HPSS systems, making distributed mass storage simpler and more convenient. A Grid-capable FTP daemon was also tested as another way to bring Grid connectivity to HPSS. NERSC experimented with remote WAN network movers as a way to bypass conventional file transfer methods and stream data quickly between sites. And we conducted a beta test of the new Linear Tape-Open (LTO) Ultrium ultra-high-density tape drives. All of these efforts, together with the GUPFS project described below, will help bring mass storage and Grid technologies together.


Developing a Global Unified Parallel File System

In a typical high-performance computing (HPC) environment, each large computational system has its own local disk as well as access to additional network-attached storage and archival storage servers. Such an environment prevents the consolidation of storage between systems, thus limiting the amount of working storage available on each system to its local disk capacity. The result is an unnecessary replication of files on multiple systems, an increased workload on users to manage their files, and a burden on the infrastructure to support file transfers between the various systems.

NERSC is using existing and emerging technologies to overcome these inefficiencies. The Global Unified Parallel File System (GUPFS) project aims to provide a scalable, high-performance, high-bandwidth, shared-disk file system for use by all of NERSC's high-performance production computational systems. GUPFS will provide unified file namespace for these systems and will be integrated with the High Performance Storage System (HPSS). Storage servers, accessing the consolidated storage through the GUPFS shared-disk file systems, will provide hierarchical storage management, backup, and archival services. An additional goal is to distribute GUPFS-based file systems to geographically remote facilities as native file systems over the DOE Science Grid.

This environment will eliminate unnecessary data replication, simplify the user environment, provide better distribution of storage resources, and permit the management of storage as a separate entity while minimizing impacts on the computational systems. The major enabling components of this envisioned environment are a high-performance shared-disk file system and a cost-effective, high-performance storage area network (SAN). These emerging technologies, while evolving rapidly, are not targeted towards the needs of high-performance scientific computing. The GUPFS project intends to encourage the development of these technologies to support HPC needs through both collaborations with other institutions and vendors, and active development.


Providing Fast, Safe Network Connections

To connect our new facilities and systems to the Grid, NERSC has implemented major upgrades to our networking infrastructure. At the end of May, the Oakland Scientific Facility upgraded its ESnet connection to OC-12 (622 Mb/s) to accommodate the growing demands of data-intensive computing (Figure 6). Gigabit Ethernet utilizing Jumbo Frames (9 KB packets) is the new standard for our internal network connecting the IBM SP and HPSS systems; when our current Cray systems are replaced, older HIPPI and FDDI connections will also be phased out. Ongoing network upgrades are being planned for the future to anticipate the ever-increasing bandwidth needs of our users.

As important as the latest networking hardware may be, NERSC's Networking and Security Group is also very concerned with day-to-day performance. They have created a new network statistics Web page (http://www.nersc.gov/nusers/resources/network/) which allows NERSC users to monitor network activity in real time. The networking staff themselves monitor traffic for signs of poor performance, and they proactively work with remote users to improve end-to-end rates. For example, they traced slow data transfers from Brookhaven National Laboratory and the Jet Propulsion Laboratory to a bug in the respective labs' firewalls and then worked with Cisco Systems to fix the bug. This kind of creative problem solving has resulted in 4 to 30 times faster data transfer rates from some user sites. (Users desiring to improve their transfer rates should contact NERSC User Services.)

The Networking and Security Group also monitors the network for intrusions and unauthorized use, and responds to security incidents in cooperation with Berkeley Lab's cybersecurity staff. Attempted intrusions have been rising steadily in the last few years (Figure 7), but security research is making progress as well. The NERSC BRO border intrusion detection system can take action to block certain attacks without human intervention. In addition, we have begun working with Juniper Networks, Inc. to test the BRO intrusion detection software at speeds of 2.4 Gb/s (OC-48).


 
Fig. 6 Network traffic graph
Figure 6. Network traffic for June through October 2001 shows the impact of the OC-12 ESnet link upgrade and the Phase 2 IBM SP being made available for general production use in August.
 

 

 
Fig. 7 External daily scans graph
Figure 7. External daily scans of "lbl.gov" servers from January 1999 to August 2001 show that attempted intrusions are increasing steadily
 

 

 
NERSC's Networking and Security Group—Eli Dart, Brent Draney, group leader Howard Walter, and Steve Lau—are the network detectives, tracking down problems and blocking hackers to ensure high performance and secure connections for NERSC users.
 


Balancing System Utilization and Response Time

Over the years, NERSC has been a pioneer in achieving high utilization on massively parallel systems—in simple terms, keeping as many processors as possible working for as long as possible. High utilization, made possible by customized scheduling and load balancing software, gives researchers more available processor hours and maximizes the value of the computing resource. NERSC's Cray T3E reached 95% utilization (averaged over 30 days) in February 2001; and taking advantage of our experience with the T3E, we were able to get 85% utilization on the IBM SP, better than the T3E's first year.

An occasional side effect of high utilization—and an undesirable one from the user's viewpoint—is a long wait for a job to run, if that particular job is not the right size for the system's current load balance. Because response time is an important factor in researchers' productivity—and scientific productivity is the ultimate measure of NERSC's success—we are now working with a user committee to find the optimum balance of utilization and response time. The committee will research solutions and develop guidelines to be implemented on NERSC systems.


Expanded PDSF Brings Cosmic Mysteries to Light

Eighty-two new servers and significant improvements in the overall computing and networking infrastructure were added this year to the PDSF (Parallel Distributed Systems Facility), a large Linux-based computer cluster that is currently operated as a partnership between NERSC and the Berkeley Lab Nuclear Science and Physics divisions.

Computing power was expanded to 395 processors, and the number of disk vaults grew from 15 to 40, with a total 24 terabytes of shared storage. Gigabit Ethernet networking for the high-bandwidth compute nodes enables the PDSF to run MPI jobs that require up to 50 nodes (100 processors). In addition, Gigabit Ethernet for the disk vaults makes it possible to take advantage of the server-side performance improvements of the 2.4 Linux kernel.

The PDSF serves the data-intensive computing needs of international high energy and nuclear physics experiments at the world's most powerful accelerator centers—including the AGS/RHIC complex at Brookhaven National Laboratory (STAR, E895), CERN (NA49, ATLAS, ALICE), and Fermilab (CDF, E871)—as well as neutrino detectors in Canada (SNO) and Antarctica (AMANDA), and other experiments that are expanding our knowledge of the universe.

The PDSF is the primary computing facility for analysis of data from the STAR experiment, which is generating 300 terabytes of data each year, and was also used to verify several major discoveries at CERN's NA49 experiment. Access to the PDSF greatly speeded up publication of initial results from STAR, E895, AMANDA, and SNO. Scientific questions the PDSF is helping to answer range from the mass of solar neutrinos to the characteristics of the quark-gluon plasma that existed briefly after the Big Bang.


Proposals Sought for NERSC-4 System

NERSC acquires a new capability-focused computational system every three years. The three-year interval is based on the length of time it takes to introduce large systems, the length of time it takes for NERSC clients to become productive on new systems, and the types of funding and financial arrangements NERSC uses. At any given time, NERSC has two generations of computational systems in service, so that each system will have a lifetime of five to six years. This overlap provides time for NERSC clients to move from one generation to the next, and provides NERSC with the ability to fully test, integrate, and evolve the latest generation while maintaining service on the earlier generation.

NERSC uses the "Best Value" process for procuring its major systems. Rather than setting mandatory requirements and using a quantitative rating scheme, the Best Value method requests baseline and value-added characteristics. These characteristics are not meant to design a specific solution but rather to signify a range of parameters that will produce an excellent and cost-effective solution. Thus, Best Value does not limit a site to the lowest common denominator requirements, but rather allows NERSC to push the limits of what is possible in order to get the best solution. Vendors indicate they prefer this method as well, because it provides them more flexibility in crafting their best solution.

A request for proposals for the NERSC-4 system was issued in November 2001, with proposals due in February 2002. Like the NERSC-3 contract described above, the NERSC-4 contract will be based on performance metrics and functional requirements, especially NERSC's Sustained System Performance (SSP) benchmark suite and Effective System Performance (ESP) test. One new feature of this contract is that the SSP value will no longer be based on the NAS Parallel Benchmarks, but will be based on the NERSC Application Performance Suite. Delivery of the initial NERSC-4 system is expected in early 2003.

User Survey Provides Valuable Feedback

NERSC's annual user survey provides feedback about every aspect of NERSC's operation, and every year we institute changes based on the survey results. Here are some of the changes resulting from the FY 2000 survey:

  • In last year's survey, one of the two top IBM SP issues was that the "SP is hard to use." Based on comments we received, we wrote more SP documentation for the Web and made changes to the user environment. This year only 12% of the comments (compared with 25% last year) indicated that the SP is hard to use.
  • We made changes that were suggested to the IBM SP disk configuration.
  • We added queue resources and created a new queue for the Cray T3E, resulting in improved satisfaction with T3E turnaround time.
  • Last year we moved PVP interactive services from the J90 to the SV1 architecture and provided more disk resources. Overall PVP satisfaction was rated higher in this year's survey.

In the FY2001 survey, there were significant increases in user satisfaction in the available computing hardware, the allocations process, and the PVP cluster. Other areas showing increased satisfaction were T3E and SP batch wait times, SP disk configuration, SP Fortran compilers, and HPSS. Areas with continuing high user satisfaction included HPSS reliability, performance, and uptime; consulting responsiveness, quality of technical advice, and follow-up; Cray programming environment; PVP uptime; and account support.

When asked what NERSC does well, some respondents pointed to our stable and well managed production environment, while others focused on NERSC's excellent support services. Other areas singled out include well done documentation, good software and tools, and the mass storage environment. When asked what NERSC should do differently, the most common responses were to provide more hardware resources and to enhance our software offerings. Other areas of concern were visualization services, batch wait times on all platforms, SP interactive services, training services, and SP performance and debugging tools.

Several sample responses give the flavor of the users' comments:

"NERSC makes it possible for our group to do simulations on a scale that would otherwise be unaffordable."

"The availability of the hardware is highly predictable and appears to be managed in an outstanding way."

"Provides computing resources in a manner that makes it easy for the user. NERSC is well run and makes the effort of putting the users first, in stark contrast to many other computer centers."

"Consulting by telephone and e-mail. Listens to users, and tries to set up systems to satisfy users and not some managerial idea of how we should compute."

"The web page, hpcf.nersc.gov, [Now www.nersc.gov] is well structured and complete. Also, information about scheduled down times is reliable and useful."

Complete survey results can be found at http://www.nersc.gov/news/survey/2001/.

< Table of Contents Top ^
Next >