|
|
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
The year 2001 was a landmark year for NERSC. It was our first year in Berkeley Lab's new Oakland Scientific Facility, and it was the year we advanced into the terascale computing era by installing what was at that time the most powerful computer in the world for unclassified research. During the year we also made significant upgrades in our mass storage and networking capabilities, and we continued making our services more responsive to client needs.
On May 24, Oakland Mayor Jerry Brown and Dr. James Decker, Acting Director of the DOE Office of Science, joined Berkeley Lab Director Charles Shank for the formal dedication of the Oakland Scientific Facility (OSF). (NERSC actually began using the building in November 2000.) Decker noted that the computational research done at NERSC will pay substantial dividends to the city of Oakland, industry partners, the DOE, the Laboratory, and the nation. Mayor Brown praised the "boldness of creativity" evident in the new facility and pledged the city's support.
About 75 guests also heard from Peter Ungaro of IBM; James Payne of Qwest Communications; Peter Wang, Chairman and President of Encinal Broadway, the building's owner; and Jud King, provost and senior vice president for the University of California. Bill McCurdy, associate laboratory director for Computing Sciences, led the machine room tour that followed, pointing to the new IBM SP as "a tool that is our future, the way we will understand our world." The new 16,000-square-foot OSF computer room was designed for flexibility
and expandability. The computer room can be expanded to 20,000 square
feet by removing a non-load-bearing wall, resulting in a facility able
to accommodate a new generation of computing systems. The facility has
a flexible power distribution system with a power supply of 5 MW, and
a high-capacity chilled-water air conditioning system to maintain the
ideal operating environment. 5 Teraflop/s IBM SP Goes Online In August, after months of rigorous acceptance testing, Phase 2 of the NERSC-3 system was made available to the research community. The 3,328-processor IBM RS/6000 SP system, named "Seaborg" in honor of Berkeley Lab Nobel Laureate Glenn Seaborg, is capable of performing 5 trillion calculations per second (5 teraflop/s). Installed in January, the SP underwent extensive configuration and customization for the NERSC production environment and users. NERSC's Sustained System Performance (SSP) benchmark suite was used to test the system with actual scientific applications in a realistic production mode. By the time the system was opened up for general production use, it was running so well that significant scientific results in several fields were produced in just the first few weeks, and users expressed a high level of satisfaction with the system.
Because the NERSC-3 contract was based on specific, firm performance metrics and functional requirements rather than on peak performance or hardware specifications, the Phase 2 IBM SP system has 22% more processing power and 50% more memory than originally planned, and thus will provide 10 million additional processor hours per year to users. Peak performance has grown from the planned 3.8 to 5 teraflop/s, with 3,008 processors available for computation. A unique aspect of NERSC-3 may be its 4.5 terabytes of main memory. The NERSC-3 contract, defined in 1998-99 and based on IBM's proposal, specified more than 40 performance and 150 functional requirements that stem from the application workload that exists at NERSC. Almost every area of function and performance is covered by overlapping but independently relevant measures. For example, there are two disk I/O measures, one for large parallel files and another for small parallel and serial files. Both metrics are necessary for an accurate estimate of the true performance for scientists. The fixed-price contract guaranteed a minimum configuration of hardwarewhat
was projected to meet all the requirementsand IBM also guaranteed
to meet the performance requirements. At the end of acceptance testing
on the expanded system, IBM had exceeded the performance and functional
requirements on the specifications. NERSC also purchased some additional
hardware in order to make the system even more suitable for certain applications.
As a result, the NERSC-3 system is capable of meeting all the maximum
performance requirements, and provides significantly enhanced capability
and performance for the typical application requirements.
To stay ahead of the growing needs of our clients and the growing capacity of our computational systems, NERSC works constantly to increase the capacity, bandwidth, and functionality of our HPSS mass storage system. This year NERSC began increasing our disk cache to 20 terabytes by adding more Fibre Channel disks. Internal data transfer rates were improved by the replacement of our HIPPI internal storage network with Jumbo Gigabit Ethernet. Additional high-capacity Fibre tape drives are expanding our archive from 1.3 to 2 petabytes. And AFS was upgraded with a bigger disk cache and faster processors. Two years ago, NERSC and Oak Ridge National Laboratory established the
Probe wide-area distributed-storage testbed to research storage technologies.
During the past year NERSC used Probe to test a new, faster version of
HSI (the HPSS interface utility) that enables access to multiple HPSS
systems, making distributed mass storage simpler and more convenient.
A Grid-capable FTP daemon was also tested as another way to bring Grid
connectivity to HPSS. NERSC experimented with remote WAN network movers
as a way to bypass conventional file transfer methods and stream data
quickly between sites. And we conducted a beta test of the new Linear
Tape-Open (LTO) Ultrium ultra-high-density tape drives. All of these efforts,
together with the GUPFS project described below, will help bring mass
storage and Grid technologies together.
In a typical high-performance computing (HPC) environment, each large computational system has its own local disk as well as access to additional network-attached storage and archival storage servers. Such an environment prevents the consolidation of storage between systems, thus limiting the amount of working storage available on each system to its local disk capacity. The result is an unnecessary replication of files on multiple systems, an increased workload on users to manage their files, and a burden on the infrastructure to support file transfers between the various systems. NERSC is using existing and emerging technologies to overcome these inefficiencies. The Global Unified Parallel File System (GUPFS) project aims to provide a scalable, high-performance, high-bandwidth, shared-disk file system for use by all of NERSC's high-performance production computational systems. GUPFS will provide unified file namespace for these systems and will be integrated with the High Performance Storage System (HPSS). Storage servers, accessing the consolidated storage through the GUPFS shared-disk file systems, will provide hierarchical storage management, backup, and archival services. An additional goal is to distribute GUPFS-based file systems to geographically remote facilities as native file systems over the DOE Science Grid. This environment will eliminate unnecessary data replication, simplify
the user environment, provide better distribution of storage resources,
and permit the management of storage as a separate entity while minimizing
impacts on the computational systems. The major enabling components of
this envisioned environment are a high-performance shared-disk file system
and a cost-effective, high-performance storage area network (SAN). These
emerging technologies, while evolving rapidly, are not targeted towards
the needs of high-performance scientific computing. The GUPFS project
intends to encourage the development of these technologies to support
HPC needs through both collaborations with other institutions and vendors,
and active development.
To connect our new facilities and systems to the Grid, NERSC has implemented major upgrades to our networking infrastructure. At the end of May, the Oakland Scientific Facility upgraded its ESnet connection to OC-12 (622 Mb/s) to accommodate the growing demands of data-intensive computing (Figure 6). Gigabit Ethernet utilizing Jumbo Frames (9 KB packets) is the new standard for our internal network connecting the IBM SP and HPSS systems; when our current Cray systems are replaced, older HIPPI and FDDI connections will also be phased out. Ongoing network upgrades are being planned for the future to anticipate the ever-increasing bandwidth needs of our users. As important as the latest networking hardware may be, NERSC's Networking and Security Group is also very concerned with day-to-day performance. They have created a new network statistics Web page (http://www.nersc.gov/nusers/resources/network/) which allows NERSC users to monitor network activity in real time. The networking staff themselves monitor traffic for signs of poor performance, and they proactively work with remote users to improve end-to-end rates. For example, they traced slow data transfers from Brookhaven National Laboratory and the Jet Propulsion Laboratory to a bug in the respective labs' firewalls and then worked with Cisco Systems to fix the bug. This kind of creative problem solving has resulted in 4 to 30 times faster data transfer rates from some user sites. (Users desiring to improve their transfer rates should contact NERSC User Services.) The Networking and Security Group also monitors the network for intrusions
and unauthorized use, and responds to security incidents in cooperation
with Berkeley Lab's cybersecurity staff. Attempted intrusions have been
rising steadily in the last few years (Figure 7), but security research
is making progress as well. The NERSC BRO border intrusion detection system
can take action to block certain attacks without human intervention. In
addition, we have begun working with Juniper Networks, Inc. to test the
BRO intrusion detection software at speeds of 2.4 Gb/s (OC-48).
Over the years, NERSC has been a pioneer in achieving high utilization on massively parallel systemsin simple terms, keeping as many processors as possible working for as long as possible. High utilization, made possible by customized scheduling and load balancing software, gives researchers more available processor hours and maximizes the value of the computing resource. NERSC's Cray T3E reached 95% utilization (averaged over 30 days) in February 2001; and taking advantage of our experience with the T3E, we were able to get 85% utilization on the IBM SP, better than the T3E's first year. An occasional side effect of high utilizationand an undesirable
one from the user's viewpointis a long wait for a job to run, if
that particular job is not the right size for the system's current load
balance. Because response time is an important factor in researchers'
productivityand scientific productivity is the ultimate measure
of NERSC's successwe are now working with a user committee to find
the optimum balance of utilization and response time. The committee will
research solutions and develop guidelines to be implemented on NERSC systems.
Eighty-two new servers and significant improvements in the overall computing and networking infrastructure were added this year to the PDSF (Parallel Distributed Systems Facility), a large Linux-based computer cluster that is currently operated as a partnership between NERSC and the Berkeley Lab Nuclear Science and Physics divisions. Computing power was expanded to 395 processors, and the number of disk vaults grew from 15 to 40, with a total 24 terabytes of shared storage. Gigabit Ethernet networking for the high-bandwidth compute nodes enables the PDSF to run MPI jobs that require up to 50 nodes (100 processors). In addition, Gigabit Ethernet for the disk vaults makes it possible to take advantage of the server-side performance improvements of the 2.4 Linux kernel. The PDSF serves the data-intensive computing needs of international high energy and nuclear physics experiments at the world's most powerful accelerator centersincluding the AGS/RHIC complex at Brookhaven National Laboratory (STAR, E895), CERN (NA49, ATLAS, ALICE), and Fermilab (CDF, E871)as well as neutrino detectors in Canada (SNO) and Antarctica (AMANDA), and other experiments that are expanding our knowledge of the universe. The PDSF is the primary computing facility for analysis of data from the STAR experiment, which is generating 300 terabytes of data each year, and was also used to verify several major discoveries at CERN's NA49 experiment. Access to the PDSF greatly speeded up publication of initial results from STAR, E895, AMANDA, and SNO. Scientific questions the PDSF is helping to answer range from the mass of solar neutrinos to the characteristics of the quark-gluon plasma that existed briefly after the Big Bang.
NERSC acquires a new capability-focused computational system every three years. The three-year interval is based on the length of time it takes to introduce large systems, the length of time it takes for NERSC clients to become productive on new systems, and the types of funding and financial arrangements NERSC uses. At any given time, NERSC has two generations of computational systems in service, so that each system will have a lifetime of five to six years. This overlap provides time for NERSC clients to move from one generation to the next, and provides NERSC with the ability to fully test, integrate, and evolve the latest generation while maintaining service on the earlier generation. NERSC uses the "Best Value" process for procuring its major systems. Rather than setting mandatory requirements and using a quantitative rating scheme, the Best Value method requests baseline and value-added characteristics. These characteristics are not meant to design a specific solution but rather to signify a range of parameters that will produce an excellent and cost-effective solution. Thus, Best Value does not limit a site to the lowest common denominator requirements, but rather allows NERSC to push the limits of what is possible in order to get the best solution. Vendors indicate they prefer this method as well, because it provides them more flexibility in crafting their best solution. A request for proposals for the NERSC-4 system was issued in November
2001, with proposals due in February 2002. Like the NERSC-3 contract described
above, the NERSC-4 contract will be based on performance metrics and functional
requirements, especially NERSC's Sustained System Performance (SSP) benchmark
suite and Effective System Performance (ESP) test. One new feature of
this contract is that the SSP value will no longer be based on the NAS
Parallel Benchmarks, but will be based on the NERSC Application Performance
Suite. Delivery of the initial NERSC-4 system is expected in early 2003. User Survey Provides Valuable Feedback
NERSC's annual user survey provides feedback about every aspect of NERSC's operation, and every year we institute changes based on the survey results. Here are some of the changes resulting from the FY 2000 survey:
In the FY2001 survey, there were significant increases in user satisfaction in the available computing hardware, the allocations process, and the PVP cluster. Other areas showing increased satisfaction were T3E and SP batch wait times, SP disk configuration, SP Fortran compilers, and HPSS. Areas with continuing high user satisfaction included HPSS reliability, performance, and uptime; consulting responsiveness, quality of technical advice, and follow-up; Cray programming environment; PVP uptime; and account support. When asked what NERSC does well, some respondents pointed to our stable and well managed production environment, while others focused on NERSC's excellent support services. Other areas singled out include well done documentation, good software and tools, and the mass storage environment. When asked what NERSC should do differently, the most common responses were to provide more hardware resources and to enhance our software offerings. Other areas of concern were visualization services, batch wait times on all platforms, SP interactive services, training services, and SP performance and debugging tools. Several sample responses give the flavor of the users' comments: "NERSC makes it possible for our group to do simulations on a scale that would otherwise be unaffordable." "The availability of the hardware is highly predictable and appears to be managed in an outstanding way." "Provides computing resources in a manner that makes it easy for the user. NERSC is well run and makes the effort of putting the users first, in stark contrast to many other computer centers." "Consulting by telephone and e-mail. Listens to users, and tries to set up systems to satisfy users and not some managerial idea of how we should compute." "The web page, hpcf.nersc.gov, [Now www.nersc.gov] is well structured and complete. Also, information about scheduled down times is reliable and useful." Complete survey results can be found at http://www.nersc.gov/news/survey/2001/. |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|