NERSCPowering Scientific Discovery Since 1974

Compiling Codes

NERSC and NPB benchmarks

Overview

There are three compiler suites available on Carver:  Portland Group (PGI), Intel, and GCC.  The PGI compilers are the default, to provide compatibility with other NERSC platforms.  Because Carver uses Intel processors, some benchmarks have shown better performance when compiled with the Intel compilers.  The GCC compilers are available primarily to facilitate building open-source tools, although they can also be used for scientific applications.

MPI

The only supported MPI implementation on Carver is Open MPI, which is descended from LAM.  In particular, note that Open MPI is not part of the MPICH family of MPI implementations.

For each supported compiler suite, NERSC provides a version of Open MPI that is compatible with that compiler.  The default is PGI.  In order to use other compilers, it is necessary to swap both the compiler module and the MPI module.  For example, to use the Intel compilers with Open MPI: 

carver% module swap pgi intel
carver% module swap openmpi openmpi-intel

To use the GCC compilers with Open MPI:

carver% module swap pgi gcc
carver% module swap openmpi openmpi-gcc

The above swap commands may be required in your batch scripts as well, if you plan to submit calculations in future sessions based on executables compiled with intel or gcc.

Compiler Names

Compiler "wrappers" provided by Open MPI supply the correct compiler and linker flags for MPI applications.  When compiling non-MPI programs (that is, either serial or shared-memory parallel applications), the "native" compilers may be used directly.

Language Open MPI Native PGI Native Intel Native GCC
Fortran mpif77, mpif90 pgf77, pgf90, pghpf ifort gfortran
C mpicc pgcc icc gcc
C++ mpiCC, mpic++, mpicxx pgCC icpc g++, c++

Basic Examples

MPI

carver% mpif90 -fast -o example.x example.f90

OpenMP

carver% pgf90 -fast -mp -o example.x example.f90

Compiler Options

Open MPI defines a single option for the compiler wrappers:

carver% mpif90 -showme ...

The "showme" option shows the command line that would be executed, without actually invoking the underlying compiler.

All remaining compiler options depend on the underlying native compiler; complete deccriptions are available via the "man" command.  Some common options are summarized below.

PGIIntelGCCExplanation
-fast -O3 -O3 Produce high level of optimization
-mp -openmp -fopenmp Activate OpenMP directives and pragmas in the code
-byteswapio -convert big_endian -fconvert=swap Read and write Fortran unformatted data files as big-endian
-Mfixed -fixed -ffixed-form Process Fortran source using fixed form specifications.
-Mfree -free -ffree-form Process Fortran source using free form specifications.
-V -V --version Show version number of the compiler.
not implemented -zero -finit-local-zero Zero fill all uninitialized variables.
-mcmodel=medium -mcmodel=medium -mcmodel=medium Allow data sections greate than 2GB

Based on vendor recommendations and our own experiences with these compilers, we recommend these options to generate fast executables:

PGI:  -fast

Intel:   the compiler's default options, i.e. no explicit optimization options, gives a very high level of optimization

GCC:  -O3 -ffast-math

Actual benchmark results will be shown in the next section.

Compiler Comparisons

There are three compilers available to users on Carver:  PGI (the default), Intel, and the gnu family of compilers.  The fact that the PGI compiler is the default is not a recommendation of that compiler.  As we show below, this compiler actual produces slower code on the average than the other two compilers.

For compiles of MPI codes the compiler wrappers, mpif90, mpicc, and mpiCC should be used instead of the actual name of the compiler in order that the mpi header files libraries be included with the compile.  If the Intel or gnu compilers are used, you should always swap the compiler module for the pgi module so that you get the proper version of the compiler as shown above.

We ran several benchmarks to determine the best optimization arguments for each compiler and the best compiler for each benchmark.  These benchmarks are described at

Intel Compiler Option Comparisons

The following Intel optimization options will be compared:

default (no optimization flags) - By default the Intel compiler has a high level of optimization.  It is comparable to the -O2 optimization level.

-O2 - This "enables  optimizations  for speed", and is the recommended option for codes in the online man page.

-O3 - This performs all of the -O2 options as well as additional more aggressive loop transformations.

-O3 -unroll-aggressive -opt-prefetch - This was recommended to us by benchmarkers as being a good supplement to the -O3 optimizations.

-fast - This "maximizes speed across the entire program".  It is a very high level of optimization, much more aggressive than that provided by the pgi "-fast" option, and includes interprocedural optimizations across files.  It increases compilation time significantly, and occasionally compiles will fail with this option which succeed with the other options, probably due to the greater processor and memory requirements.

GNU Compiler Option Comparisons

The following gnu optimization options will be compared:

-O3 - This compiles with a high level of optimization.

-O3 -ffast-math - This performs optimizations at the expense of an exact implementation of IEEE or ISO rules/specifications for math functions.

-O3 -funroll-loops - This unrolls loops whose number of iterations can be determined at compile time or upon entry to the loop. It also turns on complete loop peeling (i.e. complete removal of loops with a small constant number of iterations).  This option makes code larger, and may or may not make it run faster.

-O3 -ffast-math -funroll-loops

PGI Compiler Option Comparisons

The following PGI optimization options will be compared:

 -fast - A level of optimization which chooses generally optimal flags for the target platform.

-fast -Mipa=fast - Enables interprocedural analysis and chooses generally optimal interprocedural options  for the target platform.

-fast -Mfprelaxed - Generates relaxed precision code for those floating point operations that generate a significant performance improvement, depending on the target processor.

-fast -Mipa=fast -Mfprelaxed

Compiler Comparisons

In this section, for each benchmark, the best results for each compiler with the NERSC recommended optimization arguments are compared against each other.

The results are normalized against the PGI compiler.

 For 2 out of the 11 benchmarks PGI produced the fastest times, for 8 out of the 11 Intel produced the fastest times, and for 2 out of the 11 the Gnu Compiler produced the fastest times.

On the average, the Intel compiled codes run over 10% faster than those compiled with the other compilers.