NERSC logo National Energy Research Scientific Computing Center
  A DOE Office of Science User Facility
  at Lawrence Berkeley National Laboratory
 

Memory Considerations on Bassi

Each Bassi node has a 32 GB of physical memory, only some of which will be available to your application. Using too much memory will cause the node to page, leading to a severe performance penalty and possible node failures.

Please refer to Managing Memory Usage for a detailed discussion of memory limits and availability.

"Large Page" memory

Each Bassi node has 32 GB of shared memory. On the compute nodes 20 GB of the 32 GB are configured as "large pages." By default, the parallel (mp*) and threaded (*_r) compilers will build large-page enabled applications. The login nodes have minimal large page memory.

Large pages improve performance of many applications from 10 to 30 percent. With large pages the processors' automatic hardware prefetch mechanism is used more effectively. Use of large pages eliminates most costly TLB (translation look aside buffer) misses.

Some applications, and especially shell scripts and shorter running applications that do several fork() and exec() operations, are not suited for large pages because the assignment of large pages to a job is more costly than assigning small pages to a job. Most scientific applications that run for more than a minute do benefit from large pages.

If large pages are exhausted, enabled applications silently fail over to use small pages. Applications that use only small pages cannot access large-page memory, and are thus limited to about 2-3 GB per node on the Bassi compute nodes.

Setting your code to use (or not) large pages

By default, all IBM compilers beginning with mp and those ending with _r will build code with large page support turned on. All other compilers will not enable large pages automatically.

Large page memory usage is enabled by linking with the following compiler option to the IBM compilers:

	-blpdata

If you set the environment variable LDR_CNTRL to have the value LARGE_PAGE_DATA=Y before running a code, it will have the same effect as if all executables you run had been linked with -blpdata. As of September 11, 2006, this environment variable is set for all batch jobs in the NERSC-supplied shell initialization files.

Disable large page memory usage by linking with the following option:

	-bnolpdata

There is a "large page flag" in the binary executable that tells the OS whether or not to use large pages. You may modify an existing executable with the ldedit command. To enable large pages for an existing binary executable, use

% ldedit -blpdata executable_name

Checking if your code uses (or not) large pages

There is a utility available on bassi, called checklp, to check whether or not your code is using large pages.

> checklp mpihello1
 
The executable mpihello1 is large page enabled.
 
> checklp mpihello2
 
The executable mpihello2 is not large page enabled.

64-bit Object Mode

The default NERSC configuration on Bassi will produce executables that use a 64-bit address space when you build a code. NERSC recommends that codes use 64-bit mode unless you have a specific need to use 32-bit. In 64-bit mode your program's memory usage will not be limited by the operating system. If you have 32-bit IBM build scripts, you should remove the -bmaxdata and -bmaxstack compile options.

If you override the NERSC environment, you can use the -q64 compile and link option to run in 64-bit mode. To change your default to 32-bit addressing, set the environment variable OBJECT_MODE to have the value 32. If you work in 32-bit mode, ensure that all the machine's memory is made available to your programs - see below.

32-bit Object Mode

In 32-bit mode the memory available to your program is limited. By default, both the C and Fortran compilers impose the following limits when compiling and linking with -q32 or OBJECT_MODE=32, or OBJECT_MODE unset.

datasize        131072 kbytes
stacksize       32768 kbytes

The default memory limits in 32-bit mode can be increased by using appropriate compiler command-line options. These options are described below.

Background

From an application programmers point of view, program memory is divided into two regions: the data region which holds static data and allocated arrays, and the stack area which holds automatic data.

Some examples should make this terminology a little clearer:

  • a "regular" Fortran 77 array is in the data region when compiling with xlf77 (and variants thereof, e.g. mpxlf77). When compiled with xlf90 (and variants thereof, e.g. mpxlf90) the array is in the stack region unless the array appears in a SAVE statement, a COMMON block, or the -qsave option is used.
  • a Fortran 90 allocatable array is in the data region.
  • a Fortran 90 automatic array is in the stack region.
  • static C data is in the data region.
  • automatic C data is in the stack region.
  • C data allocated through malloc is in the data region.

These limits are soft limits and the program can exceed them with appropriate loader flags (see later). You can see that, by default, the stack region is limited to only 32MB, and the data region to 128MB. Using the default execution model on AIX, the maximum size for the stack region is 256MB, and the maximum size for data region is 2GB (this value decreases if you use MPI, LAPI etc.).

If you compile and run a program which exceeds the stack limit the program will crash with a segmentation violation. If you exceed the data limit the message:

exec(): 
0509-036 Cannot load program a.out because of the following errors:
0509-026 System error: There is not enough memory available now.

is printed.

These limits can be manipulated by using the -bmaxstack and -bmaxdata flags to the C and Fortran compilers. The -bmaxstack option works similarly to the resetting the stack limit. For example, -bmaxstack:0x10000000 (units are hex bytes) sets the stack to have a maximum size of 256MB.

The -bmaxdata option specifies that additional memory segments, separate from that used for the stack, be used to provide the storage required. When this option is used, the data region does not collide with stack data. For example, -bmaxdata:0x70000000 sets the size of the data region to have a maximum size of 1792MB.

The maximum value for maxdata (see below for physical memory constraints) depends on several factors:

Program useslargest maxdata
- 0x80000000
MPI 0x80000000
LAPI 0x80000000
MPI and LAPI 0x60000000
MPI and totalview 0x70000000

Shared Memory (shmget)

Shared memory segments are allocated by default using 4K (small) memory pages, which are limited. This default mode is adequate only if you know your shared-memory use will be small (less than 2GB per node). You can use 16MB large pages by specifying the SHM_LGPAGE and SHM_PIN flags in the shmget() call. Shared-memory segments can then use all the memory available on the node. If you exhaust small-page memory your code will likely die (and perhaps take the node with it).

If you make use of Unix shared-memory routines, for example, shget, mmap etc., in a 32-bit program, decrease maxdata by an additional 10000000 for each 256MB (or part thereof) used for shared-memory segments.

Stack Memory

All stack memory is allocated from the small-page memory pool, which is limited to about 7-8 GB per node.


LBNL Home
Page last modified: Wed, 11 Feb 2009 22:19:44 GMT
Page URL: http://www.nersc.gov/nusers/systems/bassi/memory.php
Web contact: webmaster@nersc.gov
Computing questions: consult@nersc.gov

Privacy and Security Notice
DOE Office of Science