CLIENT SERVICES

In FY 1997, DOE awarded parallel vector processing allocations on NERSC systems to 425 projects, and massively parallel processing allocations to 131 projects. A total of 1378 scientists (not including NERSC staff) made use of NERSC's computing resources. Our success is ultimately measured by the quality of science produced by these clients. Highlights of that science are in this report (see table of contents).

To help ensure that we are meeting our clients' needs, this year we established a set of ten performance goals pertaining to our systems and service. We developed these goals to set expectations for our own performance, then obtained our clients' endorsement of these goals as meaningful and useful.

Providing the necessary administrative support to keep NERSC functioning smoothly is a demanding job. Among those who have excelled in this arena are Eric Essman and Norma Early.

We now proactively gauge just how well we are doing in meeting these common expectations. We have tried to ensure that they reflect our efforts from a clients' perspective, as opposed to an internal one. For example, a measurement of system availability needs to reflect the number of hours a machine is available to our clients, not how long it takes to identify a problem and initiate corrective action on our end. Our performance goals cover the following areas:

To give our clients, our sponsors, and our own staff a better idea of how we are performing, we produced a Client Services Report covering our work from October 1996 through September 1997. A few highlights of that report are summarized below along with other client service topics. We will share results of our self-evaluations with our client community on a regular basis.

Reliable and Timely Service

This performance goal addresses two general areas:

SYSTEM AVAILABILITY DETAILS
October 1996 Through September 1997
Measured (Goal)
SystemsOverall AvailabilityScheduled AvailabilityMTBI (Hours) MTTR (Hours)
Vector97.7% (95%)99.3% (96%) 242 (96)5.3 (4.0)
Parallel95.2% (85%)98.0% (90%) 43 (96)2.7 (4.0)
Storage98.9% (95%)99.2% (96%) 85 (96)0.7 (4.0)
File Servers99.9% (96%)99.9% (96%) 1460 (316)1.0 (8.0)
SAS99.9% (97%)99.9% (99%) 973 (490)1.0 (4.0)
MTBI = Mean Time Between Interruption,    MTTR = Mean Time to Restoral

The table "System Availability Details" above shows various aspects of our systems' reliability since NERSC moved from Livermore to Berkeley Lab. Figures in bold represent the measured time, while goals are shown in parentheses and red type. Scheduled availability refers to the amount of time the systems are expected to be available (accounting for scheduled maintenance and upgrades), while gross availability is based on 24 hours a day, seven days a week. MTTR (mean time to restoral) refers to the amount of time between a system failure and the point at which full service is restored to clients. Measured performance exceeded our aggressive goals in most cases, with the exception of time between interruptions for parallel and storage systems and time to restoral for vector systems. We are working to improve our performance in those areas.

NERSC's service goals are to respond to clients' problems within four working hours and to resolve at least 90 percent of those problems within two working days. Spot checks confirm that NERSC meets the goal of responding to problems within four hours. Between July 1, 1996, and May 15, 1997, 75.3 percent of all problems were resolved within two days. As the NERSC staff got up to speed, however, we made significant progress in meeting the 90 percent goal: between March 11 and May 15, 1997, 93.1 percent of all problems were resolved within two days.

Not all problems can be resolved within two days. Reasons for putting a problem on hold include software requests, ongoing coding projects, bugs waiting for a vendor-supplied fix, and a client not responding to a request for input within two days. Problems not resolved within 72 working hours are automatically escalated for more in-depth review to ensure that outstanding problems are addressed.

NERSC staff periodically review problems and client requests to ascertain areas needing attention with an eye toward fixing them to minimize disruptions in service.

Innovative Assistance

NERSC aims to provide its clients with new ideas, new techniques, and new solutions to their scientific computing issues. For example, when DOE's Office of Energy Research announced that NERSC would be a partner in helping multidisciplinary teams from around the country solve eight of ER's twelve Grand Challenges, our computer scientists rolled out a virtual red carpet to participating researchers.

Through the Red Carpet Program, led by members of NERSC's Scientific Computing Group, NERSC staff are building individual working relationships with clients at other national labs and universities tackling such issues as cleaning up nuclear waste, supporting international research in magnetic fusion energy, designing particle accelerators, and understanding the structure of the smallest building blocks of matter.

NERSC staff are holding site meetings with each Grand Challenge team to determine what services and support are needed. Services so far include providing training for new clients, integrating separate physics software packages into a cohesive program, developing new algorithms, and offering programming tips.

NERSC has also created a specialized group to support a particular field of science. The High Energy and Nuclear Physics (HENP) Systems Group works with physicists around the globe to help develop solutions to the formidable computing challenges faced by the next generation of HENP experiments. The group provides access to and assistance with a combination of production systems such as the PDSF, advanced prototype storage systems such as HPSS and DPSS, and research and development projects such as contributing to the HENP Grand Challenge.

To address forefront scientific issues in high-energy and nuclear physics, complex experiments are being carried out by large collaborations to detect and analyze increasingly large numbers of final-state particles and/or events. The High-Energy and Nuclear Physics Support Group, led by Craig Tull, is helping scientists get the science out of massive amounts of data generated by experiments such as STAR (Solenoidal Tracker at RHIC). The challenge for the coming years is to provide cost-effective, high-performance computing capabilities and unprecedented data access which will allow widely distributed collaborations to process and analyze hundreds of terabytes of data per year.

Some of our client service experiments have worked well, while others sent us back to the drawing board. For example, to reduce travel time and costs, we tried using ISDN-based videoconferencing as a training tool. On our end, there were problems in learning how to present material via video, and clients had difficulty scheduling facilities, especially as we tried to scale up the sessions to reach more sites. As a result, we have decided to rely on Web-based technologies and are developing ways to provide reliable Web-based video. This will allow clients to tap our expertise at their desktop and on their own schedule.

Client Training

NERSC has presented training sessions to proactively help clients adapt to using the newest computing technologies. Here are some examples:

Technology Transfer

Many organizations look to NERSC to provide expertise in high-performance computing. As one of the world's first computing centers to put a Cray T3E into a demanding production environment, we are regularly contacted by others considering such a move. The U.S. Army Corps of Engineers, the National Computational Science Alliance, Georgia Tech, and computing centers in France and Korea have all tapped our experience as part of their planning process. NERSC routinely hosts visitors from educational and research institutions around the world.

NERSC staff members participate in various organizations that set the pace for new technology development. For example, the head of our Systems Group is a member of Silicon Graphics' customer advisory board; members of our Mass Storage Group serve on the HPSS executive and technical committees; and our Future Technologies Group leader serves on the MPI standards committee.

NERSC staff also share their expertise through software releases, articles in technical journals, tutorials and presentations at professional conferences and workshops, and invited talks at universities, laboratories, and high-tech industries.

ERSUG

The Energy Research Scientific Computing Users Group is composed of user representatives from the NERSC customer laboratories and university sites. This group provides guidance to both NERSC and the Office of Energy Research about current services offered by NERSC and the direction of future development. When appropriate, ERSUG appoints task forces and working groups to address specific issues related to NERSC services. ERSUG meetings for 1997 were held on January 28-29 at NERSC and June 5-6 at Princeton Plasma Physics Laboratory. Recent ERSUG activities include development of a guidance document for NERSC's next major system procurement.


Next Page
Back to Table of Contents