|
FY 2003 User Survey Results
Many thanks to the 326 users who responded to this year's User Survey -- this
represents the highest response level yet in the six years we have conducted the
survey. The respondents represent
all five DOE Science Offices and a variety of home institutions:
see Respondent Demographics.
The survey responses provide feedback about every aspect of
NERSC's operation, help us judge the quality of our services, give DOE
information on how well NERSC is doing, and point us to areas we
can improve. The survey results are listed below.
You can see the FY 2003 User Survey text, in which
users rated us on a 7-point satisfaction scale.
Some areas were also rated on a 3-point importance scale or a 3-point
usefulness scale.
Satisfaction Score | Meaning |
7 | Very Satisfied |
6 | Mostly Satisfied |
5 | Somewhat Satisfied |
4 | Neutral |
3 | Somewhat Dissatisfied |
2 | Mostly Dissatisfied |
1 | Very Dissatisfied |
|
Importance Score | Meaning |
3 | Very Important |
2 | Somewhat Important |
1 | Not Important |
|
Usefulness Score | Meaning |
3 | Very Useful |
2 | Somewhat Useful |
1 | Not at All Useful |
|
The average satisfaction scores from this year's survey
ranged from a high of 6.61 (very satisfied) to a low of 4.67 (somewhat
satisfied). See All Satisfaction Questions.
Areas with the highest user satisfaction were:
Topic |
Avg Score |
No. of Responses |
HPSS reliability |
6.61 |
126 |
Consulting - timely response |
6.55 |
207 |
Consulting - technical advice |
6.54 |
200 |
HPSS uptime |
6.54 |
126 |
Local Area Network |
6.54 |
114 |
Areas with the lowest user satisfaction were:
Topic |
Avg Score |
No. of Responses |
Access Grid classes |
4.67 |
27 |
Escher visualization software |
4.75 |
8 |
Visualization services |
4.81 |
97 |
NERSC training classes |
4.88 |
24 |
Training |
5.04 |
94 |
The largest increases in satisfaction over last year's survey
came from the IBM SP, Seaborg,
HPSS uptime, network
connectivity, and available hardware:
Topic |
Avg Score |
Increase from 2002 |
No. of Responses |
SP Applications |
6.00 |
0.30 |
94 |
SP Libraries |
6.27 |
0.18 |
131 |
SP Disk Configuration and I/O Performance |
6.15 |
0.18 |
156 |
HPSS Uptime |
6.54 |
0.17 |
126 |
Network Connectivity |
6.23 |
0.16 |
241 |
Available Hardware |
6.13 |
0.16 |
255 |
The areas rated significantly lower this year were:
Topic |
Avg Score |
Decrease from 2002 |
No. of Responses |
PDSF Fortran Compilers |
6.03 |
-0.42 |
29 |
PDSF Ability to Run Interactively |
5.77 |
-0.41 |
64 |
PDSF Applications |
5.87 |
-0.34 |
39 |
SP Queue Structure |
5.69 |
-0.23 |
177 |
SP Uptime |
6.42 |
-0.14 |
191 |
Survey Results Lead to Changes at NERSC
Every year we institute changes based on the survey.
NERSC took a number of actions in response to suggestions
from the 2002 user survey.
SP resource scheduling:
- Could longer run time limits be implemented across the board?
NERSC response:
In March 2003 limits were extended from 8 to 48 hours
for jobs running on 32 or more nodes, and from 8 to 12 hours for jobs
run on 31 or fewer nodes. The "regular long" class, which provides a 24
hour limit for jobs run on 31 or fewer nodes, was preserved but with
restrictions on the number of jobs that can run simultaneously.
- Could more services be devoted to interactive jobs?
NERSC response:
In March 2003 interactive jobs were given an additional
system priority boost (placing them ahead of debug jobs).
- Could there be a serial queue?
NERSC response:
Two new classes to facilitate pre-and-post data
processing and data transfers to HPSS were introduced in November, 2003.
Jobs run in these classes are charged for one processor's wall clock time.
- Could more resources be devoted to the "small node-long runtime" class (more nodes,
a longer run time, better throughput)?
NERSC response:
Resources were not increased for "regular long" types of jobs;
rather the priority has
been to increase resources for jobs running on more than 32 nodes.
This is in line with the DOE Office of Science's goal that 1/4 of all batch
resources be applied to jobs that use 1/8 of the available processors. For FY
2004 this goal has been increased to target 1/2 of the batch resources.
Perhaps because of this resource prioritization, satisfaction with the SP queue
structure dropped by 0.2 points.
SP software enhancements:
- Could the Unix environment be more user-friendly (e.g. more editors
and shells in the default path)?
NERSC response:
The most recent versions of vim, nano, nedit, gvim, pico, xemacs are now in
in all users' paths by default, as well as the compression
utilities zip and bunzip2. Two new utilities help make the batch environment
easier to use: llhist shows recently completed jobs and
ll_check_script gives warnings/advice on crafting batch scripts.
This year's rating for SP applications went up by 0.3
points..
- Could there be more data analysis software, including matlab?
NERSC response:
Matlab and Mathematica are available on the math server, newton. Matlab is not
available on the IBM SP because big Matlab jobs can severely affect
other users on the interactive nodes. The IDL (Interactive Data
Language) package is
available on Seaborg for interactive data analysis and
visualization of data.
Computing resources:
- NERSC needs more computational power overall.
Could a vector resource be provided?
Could mid-range computing or cluster resources be provided?
-
NERSC response:
All the above are excellent suggestions and
we certainly understand the desire for more
computational resources. The FY 2004 Seaborg allocation requests were
for 2.4 times the amount available to allocate. The reality is that
there is no budget for additional hardware acquisitions. Last
year we were able to double the number of nodes on Seaborg and this year's
rating for available computing hardware increased by 0.2 points.
Documentation:
- Provide better searching, navigation, organization of the information.
NERSC response:
The NERSC user web site (http://hpcf.nersc.gov)
has been
restructured with new navigation links that should
make finding information faster and easier. Related information has
been consolidated. Printer-friendly links have been added to
consolidate multi-page documents into a single one. The final
phase of the update will be to encode descriptions
for each page to increase the effectiveness of the search
engine.
- Enhance SP documentation.
NERSC response:
We have made an effort to keep up-to-date on a wide range of SP topics:
IBM compilers, the LoadLeveler batch system, IBM SP specific APIs, and links
to IBM redbooks. In addition the presentation of SP information has been
streamlined; hopefully information is easier to find now. In August 2003
we received
positive comments from ScicomP 8 attendees in regard
to how we present IBM documentation.
Training
- Provide more training on performance analysis, optimization and
debugging.
NERSC response:
Since last year's survey NERSC has emphasized these topics in our
training classes, for example:
CPU performance analysis on
Seaborg, Scaling I/O and Communication,
Debugging Parallel Programs with Totalview.
See http://www.nersc.gov/nusers/services/training/.
- Provide more information in the New Users Guide.
NERSC response:
More information on initial account setup was added to
the New User Guide, which was also reformatted for ease of use. See
http://hpcf.nersc.gov/help/new_user/.
This year's survey included several new questions:
-
How useful were the DOE and NERSC scaling initiatives?
[Read the Scaling Initiatives Response Page]
In FY 2003 NERSC implemented initiatives aimed at promoting highly scalable
applications as part of the DOE emphasis on large scale computing.
For the
first time, DOE had in FY 2003 an explicit goal that "25% of the usage will be accounted
for by computations that require at least 1/8 of the total [compute] resource."
(Note: for FY 2004 this goal is for 50% of the usage, rather than 25%.)
The 24 respondents who had participated in the Large Scale Jobs Reimbursement
Program and the 32 respondents who had worked on scaling their codes with the
NERSC consultants rated these initiatives as "very useful" on average. poe+,
used to measure code performance characteristics,
had been used by 104 respondents and was also rated "very useful" on
average. The
115 respondents who rated Seaborg's new batch class structure, designed to give
preference to high concurrency jobs, gave it an average rating of "somewhat
useful".
20 users wrote comments in support of the scaling initiatives, for example:
Please push this project as much as you can. This type of consulting is very
important if one goes to the limit of a system in terms of #processors and
sustained performance.
11 users stated why they thought these initiatives are
misguided. The general theme behind these comments was that it is science
output that is important, not scaling per se. Some representative comments
here:
I believe that they are totally misguided. The emphasis should be on maximizing
the SCIENTIFIC output from NERSC. If the best way to do this is for the user to
run 100 1-node jobs at a time rather than 1 100-node job, every effort should
be made to accommodate him/her. ... In the final analysis, it should be up to
the users to decide how they use their allocations. Most, if not all of us,
will choose a usage pattern which maximizes our scientific output. Remember
that most of us are in computational science, not in computer science. We are
interested in advancing our own fields of research, not in obtaining Gordon
Bell awards.
Don't freeze out the small-to-moderate user --- the science/CPU hour is often
higher for the moderate user.
There is always a tension between massive users and those who want to run
smaller jobs. While many researchers use a single node (16 processors), I think
it would not be cost effective for DOE to pay them to run on their own
machines.
-
Why do you compute at NERSC?
(What are the reasons NERSC is important to you?)
[Read All 229 Responses]
Many of the answers were along the lines of "to run my codes in order to get my
science done". Users pointed out that they need powerful compute resources that
they can't get elsewhere. Many users specifically mentioned large numbers of
processors or parallel computing as a reason to compute at NERSC. Turnaround
time (getting results fast) is very important. Data analysis, especially in the
context of PDSF computing is also a common theme. One user even pointed out
that the time is "free".
- Has security gotten in the way of your work at NERSC?
Ninety percent of the respondents (217 users) answered no to this
question.
-
If security has gotten in the way of your work at NERSC, how?
[Read All 25 Responses]
25 users answered this question:
- 10 pointed to difficulties accessing NERSC (the change to ssh
version 2, FTP retirement, difficulties with tunneling and ports).
- 6 reported password or login attempt problems.
- 3 encountered difficulties
with accessing HPSS
- 3 had grid/distributed computing concerns,
- 3 said "it's inconvenient".
-
How do you compare NERSC openness and access to your home site and others?
[Read All 146 Responses]
- 49% stated that NERSC has similar
or greater openness than other sites they access
- 28% said that NERSC's openness or security measures are good
(without making a comparison)
- 9% said that NERSC is less open or too secure
Users are invited to provide overall comments about NERSC:
-
119 users answered the question What does
NERSC do well?
69 respondents pointed specifically to NERSC's good hardware management
practices which
provide users with excellent access to HPC resources;
62 mentioned User Support and NERSC's responsive staff; 17 highlighted
documentation and 13 job scheduling and batch throughput.
Some representative comments are:
Powerful and well maintained machines, great mass storage facility, and helpful
and responsive staff. What more could you want?
As Apple would put it .... "it just works". I get my work done and done fast.
Seaborg is up and working nearly all the time. Network, storage, it's all there
when I need it. That is what matters most and NERSC delivers.
NERSC simply is the best run centralized computer center on the planet. I have
interacted with many central computer centers and none are as responsive, have
people with the technical knowledge available to answer questions and have the
system/software as well configured as does NERSC.
-
75 users responded to What should
NERSC do differently?.
The area of greatest concern is job scheduling; 14 users expressed concerns
with favoring large jobs at the expense of smaller ones; six wanted more
resources devoted to interactive computing and debugging. Next in concern is
the need for more hardware: more compute power overall, different
architectures, mid-range computing support, vector architectures. Eight users
pointed out the need for better documentation and six wanted more training.
Some of the comments from this section are:
NERSC's new emphasis favoring large (1024+ processor) jobs runs contrary to its
good record of catering to the scientific community. It needs to remember the
community it is serving --- the customer is always right. The queue
configuration should be returned to a state where it no longer favours jobs
using large numbers of processors.
I'm not in favor of giving highest priority to the extremely large jobs on all
nodes of seaborg. I think that NERSC must accommodate capacity computing for
energy research that cannot be performed anywhere else, in addition to
providing capability computing for the largest simulations.
NERSC should move more aggressively to upgrade its high end computing
facilities. It might do well to offer a wider variety of architectures. For
example, the large Pentium 4 clusters about to become operational at NCSA
provide a highly cost effective resources for some problems, but not for
others. If NERSC had a greater variety of machines, it might be able to better
serve all its users. However, the most important improvement would be to simply
increase the total computing power available to users.
It would be great if NERSC could again acquire a supercomputer with excellent
vector-processing capability, like the CRAY systems which existed for many
years. The success of the Japanese "Earth Simulator" will hopefully cause a
re-examination of hardware purchase decisions. Strong vector processors make
scientific programming easier and more productive.
Measure success on science output and not on size of budgets or quantity of
hardware.
The overhead on account managers still seems a bit much for what we're getting.
I still find the ERCAP process onerous (i.e., more information requested than
should be necessary). Also, most of the codes we are using are changing much
more from year to year in a scientific sense than a computational sense, it
becomes repetitious to have to keep evaluating them computationally each year.
You need to keep in mind that most of us are being funded to do science rather
than computational research.
-
65 users answered the question
How does NERSC compare to other centers you have used?
63% of the respondents stated that NERSC was a good center (no comparison made)
or was better than other centers they used.
Reasons given for preferring NERSC include good hardware, networking and
software management, good user support, and better job throughput.
11% of the respondents said that NERSC was not as good as another center they
used. The most
common reason for finding dissatisfaction with NERSC is job scheduling.
Here are the survey results:
- Respondent Demographics
- Overall Satisfaction and Importance;
Why do you use NERSC?; Security and Flexible Work Option
-
All Satisfaction Questions and Changes from Previous Years
- DOE and NERSC Scaling Initiatives
- Web, NIM, and Communications
- Hardware
- Software
- Training
- User Services
- Comments about NERSC
|