Annual Report
2001
TABLE OF CONTENTS YEAR IN REVIEW SCIENCE HIGHLIGHTS
YEAR IN REVIEW

NERSC STRATEGIC PLAN  
Director's
Perspective
 
Computational Science at NERSC
NERSC Systems and Services
High Performance Computing R&D at Berkeley Lab
Basic Energy Sciences
Biological and Environmental Research
Fusion Energy Sciences
High Energy and Nuclear Physics
Advanced Scientific Computing Research and Other Projects
NERSC Components
The four principal components of the next-generation NERSC are designed to serve the DOE science community.

Over the five years that NERSC has been located at Ernest Orlando Lawrence Berkeley National Laboratory, it has built an outstanding reputation for providing both high-end computer systems and comprehensive scientific client services. At the same time, NERSC has successfully managed the transition for its users from a vector-parallel to a massively parallel computing environment. In January 2001, DOE's Mathematical, Information, and Computational Sciences (MICS) program asked Berkeley Lab to develop a strategic proposal which, building on a foundation of past successes, presents NERSC's vision for its activities and new directions over the next five years. The proposal was delivered in May 2001, and this section of the Annual Report summarizes its main themes.

NERSC proposed a strategy consisting of four components in order of priority. The two ongoing components, which will be enhanced over the next five years, are:

  • High-End Systems — NERSC will continue to focus on balanced introduction of the best new technologies for complete computational and storage systems, coupled with the advanced development activities necessary to wisely incorporate these new technologies.

  • Comprehensive Scientific Support — NERSC will continue to provide the entire range of support activities, from high-quality operations and client services to direct collaborative scientific support, to enable a broad range of scientists to effectively use the NERSC systems in their research.

The two new strategic components are:

  • Support for Scientific Challenge Teams — NERSC will focus on supporting these teams, with the goal of bridging the software gap between currently achievable and peak performance on the new terascale platforms.

  • Unified Science Environment (USE) — Over the next five years, NERSC will use Grid technology to deploy a capability designed to meet the needs of an integrated science environment, combining experiment, simulation, and theory by facilitating access to computing and data resources, as well as to large DOE experimental instruments.

HIGH-END SYSTEMS

Providing the most effective and most powerful High-End Systems possible. This is the foundation upon which NERSC builds all other services in order to enable computational science for the DOE/SC community. High-End Systems at NERSC mean more than highly parallel computing platforms—they also include a very large-scale archival storage system, auxiliary and developmental platforms, networking and infrastructure technology, system software, productivity tools for clients, and applications software. Our successful High-End Systems strategy includes advanced development work, evaluating new technologies, developing methodologies for benchmarking and performance evaluation, and acquisition of new systems.

There are three major areas of system design and implementation at NERSC: the computational systems, the storage system, and the network. The balance of the entire Center is determined by the requirements that evolve from the increased computational capability, plus independent requirements for other resources. Enhanced storage systems must be designed to support not just current work, but future workloads as well. Figure 10 shows the evolution of the NERSC system architecture between 2001 (left) and 2006 (right), with the introduction of the Global Unified Parallel File System and the Unified Science Environment integrating the discrete computational and storage systems.

 
NERSC system architecture
Figure 10. Evolution of the NERSC system architecture between 2001 (left) and 2006 (right).
 

We expect that NERSC-4 and very likely NERSC-5 will be commercial integrated SMP cluster systems. Special architectures will be considered, but it is not likely that these will be ready for high-quality production usage in the next five years. Commodity cluster systems will also be considered, but based on our technology assessments, we do not believe it likely that these systems will be able to support the diverse and communication-intense applications at NERSC in this time frame. Equivalently balanced cluster hardware will at best have a modest performance-per-dollar advantage, but cluster software in particular is significantly less mature than vendor-supplied software. NERSC will use the "best value" process for procuring its major systems, as described above.

Between now and 2006, NERSC plans to augment both the aggregate capacity and the transfer rate to and from the mass storage system. NERSC will continue collaborating in High Performance Storage System (HPSS) development, in order to improve archive technology. In particular, NERSC will help develop schemes to replicate data over long distances and to import and export data efficiently.

As high-performance computing becomes more network-centric (the Grid, HPSS, cluster interconnects, etc.), the network will become the "glue" that holds everything together. NERSC must become a center of excellence in network engineering; this is the only way we will be able to deliver the full capability of our systems to our users. NERSC will expand its networking and data communication capacity regularly as applications become more bandwidth intensive, and we will take advantage of the latest enhancements in networking systems and protocols to enable NERSC clients to access the system and move data.

COMPREHENSIVE SCIENTIFIC SUPPORT

As described above, NERSC continues to provide early, large-scale production computing and storage capability to the DOE/SC computational science community. The NERSC systems will be of such a scale as to be unique or nearly unique in many aspects (e.g., computational abilities, storage capacity, etc.). The goal of NERSC's Comprehensive Scientific Support function is to make it easy for DOE computational scientists to use the NERSC high-end systems effectively by:

  • Providing consistent high-quality service to the entire NERSC client community through the support of the early, production-quality, large-scale capability systems.

  • Aggressively incorporating new technology into the production NERSC facility by working with other organizations, vendors, and contractors to develop, test, install, document, and support new hardware and software.

  • Ensuring that the production systems and services are the highest quality, stable, secure, and replaceable within the constraints of budget and technology.

  • Participating in other work to understand and address the unique issues of using large-scale systems.

Comprehensive Scientific Support is the heart of the strategy that sets NERSC apart from other sites and greatly enhances the impact of NERSC's High-End Systems. Elements of this support include:

  • System monitoring and operational support on a 24 x 7 x 365 schedule.

  • Advanced consulting support during business hours.

  • Direct collaborative support by staff scientists on major projects.

  • Up-to-date and convenient training and documentation.

  • Account management and allocations support.

  • Efficient system management, including cyber security.

  • System hardware and software improvements, implemented with little or no service disruption.


SUPPORT FOR SCIENTIFIC CHALLENGE TEAMS

The arrival of large, highly parallel supercomputers in the early 1990s fundamentally changed the mode of operation for successful computational scientists. In order to take full advantage of the new capabilities of these parallel platforms, scientists organized themselves into national teams. Called "Grand Challenge Teams," they were a precursor to the "Scientific Challenge Teams" that NERSC anticipates as its leading clients in the next decade. These multidisciplinary and multi-institutional teams engage in research, development, and deployment of scientific codes, mathematical models, and computational methods to maximize the capabilities of terascale computers. NERSC responded by creating the "Red Carpet" plan, which revolved around building individual relationships with the users as well as providing a NERSC staff member as a point of contact to expedite any problems or concerns.

In March 2000 DOE launched a new initiative called "Scientific Discovery through Advanced Computing" (SciDAC). SciDAC defines and explicitly calls for the establishment of Scientific Challenge Teams. These teams are characterized by large collaborations, the development of community codes, and the involvement of computer scientists and applied mathematicians. In addition to high-end computing, teams will also have to deal increasingly with issues in data management, data analysis, and data visualization. The expected close coupling to scientific experiments supported by the USE environment (described below) will be an essential requirement for success for some teams. Scientific Challenge Teams represent the only approach that will succeed in solving many of the critical scientific problems in SC's research programs. These teams are the culmination of the process of users moving to ever-higher computing capability, and NERSC's new structure enables that entire process (Figure 11).

NERSC's strategy for the next five years is to build a focused-support infrastructure for the Scientific Challenge Teams consisting of four components:

  • integrated support and collaboration from the NERSC staff

  • deployment of tools developed by the SciDAC Integrated Software Infrastructure Centers (ISICs)

  • deployment of grid and collaboration technologies (USE)

  • building the software engineering infrastructure.
 
NERSC_Transition
Figure 11. NERSC facilitates the transition to high-end capability computing, and enables Scientific Challenge Teams through intensive support.
 


UNIFIED SCIENCE ENVIRONMENT (USE)

A second new component of the NERSC strategy addresses another change in the practice of scientific computing. In recent years rapid increases in available networking bandwidth, combined with continuing increases in computer performance, are making possible an unprecedented simultaneous integration of computational simulation with theory and experiment. This change will have a fundamental impact on areas of science that have not yet made much use of high-end computing. By deploying critical parts of a Unified Science Environment (USE), NERSC anticipates playing a role in the emergence of a new paradigm in computational science.

Examples of the potential of—and the necessity for—a unified approach to computing and science may be found in many of DOE's large-scale science projects, such as accelerator-based science, climate analysis, collaboration on very large simulation problems, and observational cosmology. These activities occur in widely distributed environments and under circumstances that are constrained by the timing of the experiments or collaborations, and are essential to advancing those areas of science. The USE will help support this integration and facilitate DOE's large-scale science.

Grids will play an important role in NERSC, and NERSC will play an important role in Grids. Though Grids provide the middleware for managing and accessing widely distributed resources, NERSC will add the very high-end computing and storage for Grids when it is feasible. Grid middleware provides the user with a uniform view of the job- and data-management environment across heterogeneous systems. This environment has a single, consistent security model and strong security services that are not obstructive. Tools are available in this environment to manage complex sequences of tasks. Inclusion of NERSC in the DOE Science Grid will make high-end services available to NERSC computational scientists through the uniform Grid environment (Figure 12). The resulting combination of Grid access to desktop, midrange, and high-end services creates the USE.

 
DOE Science Grid
Figure 12. The role of NERSC as the largest computational resource in the DOE Science Grid.
 


COLLABORATIONS

Finally, NERSC will expand its collaborations with other institutions, especially with the other DOE SC laboratories, to systematically integrate into its offerings the products of their efforts in computational science. With this strategy NERSC will enhance its successful role as a center that bridges the gap between advanced development in computer science and mathematics on one hand, and scientific research in the physical, chemical, biological, and earth sciences on the other. Implementing this strategy will position NERSC to continue to enhance the scientific productivity of the DOE SC community, and to be an indispensable tool for scientific discovery.


CONCURRENCE

The NERSC Strategic Proposal was anonymously reviewed by 15 independent experts in high performance scientific computing. The proposal and the reviewers' comments were analyzed by the DOE Office of Advanced Scientific Computing Research (ASCR) and discussed with representatives of other Office of Science programs. At the conclusion of this review process, the DOE accepted the broad outline of the strategic plan and committed to supporting NERSC at Berkeley Lab for the next five years. The ASCR program managers agreed with the four components of the plan and their order of priority, emphasizing High End Systems and Comprehensive Scientific Support.

In a letter to Berkeley Lab Director Charles V. Shank, dated November 8, 2001, Dr. C. Edward Oliver, Associate Director of Science for ASCR, wrote, "Your proposal presents a sound strategy for providing high-performance scientific computing hardware and services in a manner commensurate with the near-term expectations of the Office of Science." Dr. Oliver described the NERSC staff's commitment to excellence as a "vital attribute" of the center, and concurred with many of the reviewers' observations that NERSC has provided "world-class hardware, timely technology upgrades and services virtually unsurpassed by any other computer center in the world."

 

 
< Table of Contents Top ^
Next >