|
NERSC Announcements Message Archive
Select:
[all-announcements]
[users]
[franklin]
[bassi]
[jacquard]
[davinci]
[nug]
[managers]
[ Back ]
Subject: |
[bassi-users] Important memory configuration changes on Bassi following downtime Thursday, Dec. 11 |
Author: |
Richard Gerber <ragerber_at_lbl.gov> |
Date: |
2008-12-03 16:04:08 |
Dear Bassi users,
Bassi will be down for maintenance on Thursday, Dec. 11 from 10:00 to 16:00 PST.
During the maintenance we will make a change to the batch system that may
impact your jobs.
If you are currently setting the ConsumableMemory resource in your batch
scripts you will have to modify that value after the system returns
on Dec. 11, 2008. If you are using fewer than 8 tasks per node, please read
this message. If your jobs use 8 tasks per node and you are running successfully
now without explicitly manipulating the ConsumableMemory setting, you can
disregard the rest of this e-mail.
SUMMARY
The meaning of the value specified for ConsumableMemory will change.
Instead of referring to a per-task total memory value, ConsumableMemory
will refer to AIX small-page (4K pages) memory only. The appropriate new
values for your job - based on the number of tasks per node you use -
can be found at
http://www.nersc.gov/nusers/systems/bassi/running_jobs/memory.php
For jobs already queued when the system is taken down for maintenance, we
will manually adjust the setting and you do not need to take any action.
New jobs submitted with the old settings on or after Dec. 11, 10:00 PST will
either not run or may be killed prematurely.
BACKGROUND
Why are we making this change? We believe the new configuration will
result in a more robust system. Bassi's nodes are prone to failure when
a job quickly attempts to use memory far above the physical memory on a
node (32 GB, of which 26.4 GB is available to applications). IBM provides
software that will kill user tasks when they exceed the ConsumableMemory
value, thus protecting the node from failure. A node that fails will
be completely out of service for an indeterminate time and may
lead to system-wide problems.
Nodes fail when an application uses more memory than is physically
available in DRAM and then uncontrollably starts to "page" memory to
disk. On Bassi, only the 12 GB of memory that is configured as 4KB
"small pages" can swap to disk. The remaining 20 GB of "large-page"
(16MB) memory pages are "pinned" and can not swap. (Scientific
applications generally perform significantly better when using
large-page memory, but AIX and the program stack require swappable 4K
memory. The page configuration is set at boot time only and NERSC made
the 20/12 GB split based on benchmark results, consultation with IBM, and
past experience on Bassi.) The current situation - where
ConsumableMemory refers to the aggregate sum of large and small pages -
is not adequately protecting the system because some codes attempt to
allocate large amounts of memory on the stack, causing excessive
small-page memory paging and killing the node while still remaining well
under the total memory limit that would trigger a job kill signal.
In the new configuration, tasks will be killed if they exceed the
ConsumableMemory limit in small-page memory usage only.
ADVANTAGES
* Fewer nodes failures and reboots
* No more artificial per-task large-page memory limitations. (Under the
old configuration a single task was limited to an amount of memory equal
to the ConsumableMemory setting. This did not allow one task to use significantly
more memory than the other tasks on a node.)
DISADVANTAGES
* Existing batch scripts need to be modified to change the value for
ConsumableMemory.
--
Richard Gerber, Ph.D. ragerber@lbl.gov
NERSC phone: 510-486-6820
Lawrence Berkeley National Lab fax: 510-486-4316
Berkeley, CA 94720
_______________________________________________
bassi-users mailing list
bassi-users@nersc.gov
|
|