Skip all navigation and jump to content Jump to site navigation Jump to section navigation.
NASA Logo - Goddard Space Flight Center + Visit NASA.gov
NASA Center for Computational Sciences
NCCS HOME USER SERVICES SYSTEMS DOCUMENTATION NEWS GET MORE HELP

 

Documentation
OVERVIEW
GENERAL SOFTWARE INFO
HALEM
DALEY AND COURANT
PALM/EXPLORE
DIRAC/JIMPF

More halem links:

+ Quick Start Guide

+ Overview of System Resources

+ Programming Environment

+ Batch Queues

+ Software and Tools

halem Software and Tools

Parallel Programming Model
+ OpenMP
+ MPI
+ MPI-IO
+ SHMEM
+ SMS

Performance, Analysis, and Debugging
+ Compiler listing
+ Timing the program
+ Prof
+ Pixie
+ VAMPIR
+ Ladebug
+ Totalview

Other Software
+ Autoconf
+ Automake
+ BLACS
+ FFTW
+ GrADs
+ HDF
+ HDF5
+ JPEG
+ NCAR Graphics
+ NCO
+ NEdit
+ NetCDF
+ Numerical Python
+ Fortran for Python

+ Python
+ ScaLAPACK
+ TAU
+ Udunits
+ Zlib

OpenMP

OpenMP is a standard Application Program Interface (API) for shared-memory parallel programming in Fortran, C, and C++. Directives must be added to source code to parallelize loops and specify certain properties of variables.

Fortran with OpenMP directives is compiled as:

% f90 -fast -omp myprog.f -o myprog.exe

C code with OpenMP directives is compiled as:

% cc -fast -omp myprog.c -o myprog.exe

You can run OpenMP two ways: batch interactive or batch.

Batch Interactive. Batch interactive gives you an interactive batch session for development and debugging. At the interactive prompt, you can simply run your OpenMP executable as:

% ./myprog.exe

By default, the system will run your OpenMP jobs using four threads. If you want to run with fewer than four threads, set the OMP_NUM_THREADS environmental variable accordingly. For example, if you want to run using two threads, specify the following:

% env OMP_NUM_THREADS=2 ./myprog.exe

Since there are only four processors per node, we strongly advise that you limit the number of threads to four.

Batch. The second way to run OpenMP is to submit it as a batch job. You can either submit the program to run or you can wrap it in a script and submit the script. To submit the program to run on four processors running the default four threads, specify the folloiwng:

% bsub -q general -n 4 -Pt000 ./myprog.exe

In the above example and subsequent examples, don't forget to replace t000 with your actual Computational Project. You may use the command getsponsor to retrieve your Computational Project.

If you want to run two threads, specify the following:

% bsub -q general -n 4 -Pt000 env OMP_NUM_THREADS=2 ./myprog.exe

Please note that the minimum number of processors you can ask for is four; that is, one node, even though you may be running fewer than four threads.

Before using a batch script, you must write and submit the script. An example of a script that will achieve the same effect as the example above is:

#-----script begins-------
#!/bin/csh
#BSUB -n 4
#BSUB -q general
#BSUB -P
t000
#BSUB -o
myprog.out
#BSUB -e
myprog.err

env OMP_NUM_THREADS=2 ./myprog.exe
#-----script ends-------

Assuming you name this script myscript, you can then submit your job by issuing the following command:

bsub < myscript

Please note that the "<" character is necesary.

Timing issues

Many timers accumulate time from daughter processes which makes -omp _appear_ to slow the code down or make no improvement, e.g., the unix "time" command and the f95 CPU_TIME intrinsic both do this.

Common problems

One of the most common problems encountered after parallelizing a code is the generation of floating point exceptions or segmentation violations that did not occur before. In OpenMP, these errors may be caused by uninitialized variables. The reason that uninitialized variables sometimes don't show up in sequential code is because on halem, by default, non-OpenMP codes are compiled with the -static compiler option, which potentially sets all unitialized variables initially to zero and causes all local variables to be statically allocated. For OpenMP codes, however, the default is -automatic. This implies that local variables are placed on the run stack and no assumtption is made about their values. Because the compiler flag -warn uninitialized is on by default on halem, the compiler should warn about variables that are being used before values are assigned to them. Even if you are not using OpenMP, it is good practice not to rely on the default -static behavior but rather to explicitly make sure that variables are assigned values before they are used, even if those values are zero.

| Top of Page |


MPI

MPI is a standard specification for message-passing libraries, created by the Message Passing Interface Forum (MPIF). A set of hands-on tutorials are avaiilable for you to familiarize yourself with the use of MPI:

The MPI version on halem is fully compliant with the MPI-1 standard and it incorporates some of the capabilities of MPI-2.

Compiling MPI programs on halem

To compile MPI programs, you must link to the MPI library:

For FORTRAN: f90 -o mpicode mpicode.f -lmpi
For C and C++: cc -o mpicode mpicode.c -lmpi

Your MPI program should always include the appropriate header file:

For FORTRAN: include "mpif.h"
For C and C++: include "mpi.h"

NOTE: You need not and should not have a copy of mpif.h or mpi.h in your build or run directory, as this may cause errors in your program. Your program will automatically access the correct halem-specific copy of the mpi header file provided by the system.

Running MPI programs on halem

MPI jobs are run on batch mode on halem. The command to launch an MPI job on halem is prun, which should be included in your batch script. The prun command provides you with more options to control the processor and node allocation of your job, as illustrated in Example 3 below. Refer to the prun man page for more information.

It is important to note that the batch submission bsub command has an argument for the number of requested processors (-n), while the prun command has the same (-n) argument to specify the number of processes to be launched---they are not the same. Both should be specified.

The number of processors requested from LSF in your bsub command should be a multiple of 4, as you will always be assigned full nodes and all their memory. The number of parallel processes launched by prun can be any number, but it must be smaller or equal to the number of processors requested from LSF in the bsub command. See examples 1 and 2 below for illustration.

Example 1: Your job "mpicode" requires 8 MPI processes to run on 8 processors. Your script to run "mpicode" can include the following two lines:

#BSUB -n8
.
.
prun -n8 ./mpicode

Example 2: Your job "mpicode" requires 10 MPI processes to run. You will need to request 3 full nodes, or 12 processors, from LSF to run your job:

#BSUB 12
.
.
prun -n10 ./mpicode

Example 3: Your job "mpi_OMP_code is actually a "hybrid" MPI/OpenMP code, and each of the 8 MPI processes will launch 4 OpenMP threads. You will likely want to request 32 processors from LSF (#BSUB -n 4) and have prun launch 8 MPI processes (-n 8) to be allocated on 8 different nodes (-N 8) with 4 cpus on each node (-C4)

#BSUB -n32
.
.
prun -n8 -N8 -C4 ./mpi_OMP_code

While debugging small MPI jobs, you might want to use the interactive batch queue.

Some simple examples

Fortran example

The following FORTRAN program uses message passing software:

C -----------------------------------------------------------------
C senddata --- example MPI Fortran program
C
C This program will demonstrate point-to-point communication
C between 2 processors. Processor 0 will send an arbitrary integer
C value to processor 1. In this example, the value of 10 will be sent.
C Processor 1 will receive this message and write
C to standard output the contents of the message. In addition,
C the contents of the status array will be output.
C
C
C -----------------------------------------------------------------

Program senddata

include 'mpif.h'

integer myproc, size
integer din, dout
integer next, prev, tag
integer req
integer i, errcode
integer status(MPI_STATUS_SIZE)

print *, 'Alive'
c
c Initialize MPI environment.
c
call MPI_INIT(errcode)
c
c Determine my processor id.
c
call MPI_COMM_RANK(MPI_COMM_WORLD, myproc, errcode)
c
c Determine number of processors allocated.
c
call MPI_COMM_SIZE(MPI_COMM_WORLD, size, errcode)
print *, '[',myproc,'] Total # of processors used in this example
* = ',size

if (size .ne. 2) go to 20
c
c Set temporary value for dout on each processor.
c
dout = 0
c
c Set destination id. This will be the number of the processor where I will
c send data.
c
next = 1
c
c Set source id. This will be the number of the processor from which
c I will receive data.
c
prev = 0
c
tag = 0

c
c Specify new value for dout on processor 0.
c
if (myproc .eq. 0) dout = 10
print *, '[',myproc,'] Initial val = ', dout

if (myproc .eq. 0) then
c
c Send contents of dout to next processor. The variable dout has
c been set to the value 10. The 'next' processor in this
c example will be processor 1, as calculated above.
c
call MPI_SEND(dout, 1, MPI_INTEGER, next, tag,
+ MPI_COMM_WORLD, errcode)
if (errcode .ne. MPI_SUCCESS) then
write(*,*) 'There was a problem with the send.'
endif
else
c
c Receive message into din from prev processor. The 'prev' processor
c in this example will be processor 0.
c
call MPI_RECV(din, 1, MPI_INTEGER, prev, tag,
+ MPI_COMM_WORLD, status, errcode)
if (errcode .ne. MPI_SUCCESS) then
write(*,*) 'There was a problem with the receive.'
endif
print *, '[',myproc,'] Received ', 'val = ',din
print *, '[',myproc,'] Source of message = ',status(MPI_SOURCE)
print *, '[',myproc,'] Tag = ',status(MPI_TAG)

end if

20 print *, '[',myproc,'] Complete'
c
c Terminate MPI.
c
call MPI_FINALIZE(errcode)
end

C example

/* greetings.c -- greetings program
*
* Send a message from all processes with rank != 0 to process 0.
* Process 0 prints the messages received.
*
* Input: none.
* Output: contents of messages received by process 0.
*
* See Chapter 3, pp. 41 & ff in PPMPI.
*/
#include "stdio.h"
#include "string.h"
#include "mpi.h"

main(int argc, char* argv[]) {
int my_rank; /* rank of process */
int np; /* number of processes */
int source; /* rank of sender */
int dest; /* rank of receiver */
int tag = 0; /* tag for messages */
char message[100]; /* storage for message */
MPI_Status status; /* return status for */
/* receive */

/* Start up MPI */
MPI_Init(&argc, &argv);

/* Find out process rank */
MPI_Comm_rank(MPI_COMM_WORLD, &my_rank);

/* Find out number of processes */
MPI_Comm_size(MPI_COMM_WORLD, &np);

if (my_rank != 0) {
/* Create message */
sprintf(message, "Greetings from process %d!",
my_rank);
dest = 0;
/* Use strlen+1 so that '\0' gets transmitted */
MPI_Send(message, strlen(message)+1, MPI_CHAR,
dest, tag, MPI_COMM_WORLD);
} else { /* my_rank == 0 */
for (source = 1; source != np; source++) {
MPI_Recv(message, 100, MPI_CHAR, source, tag,
MPI_COMM_WORLD, &status);
printf("%s\n", message);
}
}

/* Shut down MPI */
MPI_Finalize();
} /* main */

Commands to run the example programs

To run the above program on halem, use the following commands:

# Compile and load program
f90 program.f -lmpi (Fortran)
cc program.c -lmpi (C)
# Run executable (in interactive mode)
bsub -P [sponsor_code] -I -q [batch-queue] -n 12 prun -n 10 ./a.out

Additional information

For more information on MPI, please see:

  • The man pages on MPI and MPI routines on halem. It should be noted that the man pages expect precise capitalization. For example, MPI_Comm_rank works, but mpi_comm_rank and other variations do not.
  • The Message Passing Interface (MPI) standard. This page links to multiple sources of message passing and parallel programming information.

| Top of Page |


MPI-IO

The best starting point for learning MPI-IO is the Official MPI-2 Standard documents. MPI-IO hints that are specifically supported in the HP AlphaServer SC PFS file system are in the Quadrics HP AlphaServer SC User Guide (Click MPI-IO Support under Appendix C on this page).

To write an MPI-IO Fortran program, insert the following MPI header file in the source file:

INCLUDE "mpif.h"

and then compile the code with MPI and MPI-IO libraries:

% f90 -o mpi_iocode mpi_iocode.f90 -lmpi -lmpio

The file mpif.h has definitions for Fortran MPI-IO programs. A short sample MPI-IO test program can be downloaded and tested on halem.

| Top of Page |


SHMEM

To build Fortran SHMEM programs, insert the SHMEM header file in your source files:

INCLUDE 'shmemf.h'

and compile the program with f90 and link with the SHMEM library:

% f90 -o shmemcode shmemcode.f90 -lshmem

| Top of Page |


SMS

SMS (Scalable Modeling System, sms-2.8.0.r4i4 and sms-2.8.0.r8i8) is a directive-based parallelization tool that translates Fortran code into a parallel version that runs efficiently on both shared and distributed memory systems including the IBM SP2, Cray T3E, SGI Origin, Sun Clusters, Alpha Linux clusters and Intel Clusters. The r4i4 package option was installed to run code compiled with the "-r4 -i4" option. The r8i8 option was installed to run code compiled with the "-r8 -i8" option. A tutorial is available at www-ad.fsl.noaa.gov/ac/SMS_UsersGuide_v2.8.pdf.

To use Scalable Modeling System (SMS) libraries, first set the environment variable SMS to the installation directory. On halem, this directory is /usr/ulocal/sms-2.7.0.r8i8 for the code that is to be compiled with real*8, integer*8 option. Other libraries for real*8/integer*4, and real*4/integer*4 cases are in /usr/ulocal/sms-2.7.0.r8i4, and /usr/ulocal/sms-2.7.0.r4i4, respectively. Next add $SMS/bin to your path. The bin directory includes the scripts mpirun, mpif90, and mpicc that SMS scripts might call to run mpi executables and compile mpi codes. If the SMS script uses bourne shell, the above should be set in the file .profile. If the SMS script uses C shell, set the above in .cshrc.

The file .profile in your home directory would include:

SMS="/usr/ulocal/sms-2.7.0.r8i8"
export $SMS

echo $PATH | /bin/grep -q "$HOME/bin" ||
{
PATH=$SMS/bin:$HOME/bin:${PATH:-/usr/bin:.}
export PATH
}

The file .cshrc in your home directory would include:

setenv SMS /usr/ulocal/sms-2.7.0.r8i8
setenv PATH ${PATH}:$SMS/bin:.

SMS is an "unsupported" application, meaning that NCCS staff will only install the software and test it with sample code for validation. All SMS-related questions should be directed to the NOAA Forecast Systems Laboratory.

| Top of Page |


Compiler listing

To obtain a compiler listing, compile the code using the command:

%f90 -V annotations all code.f90

The "-V annotations all" options cause the compiler to create a listing file with annotations describing what optimizations the compiler has made.

The other common UNIX profilers are the command line options -p and -pg. These command line options instrument the code.

% f90 -p -g3 code.f90
% f90 -pg -g3 code.f90
% a.out

| Top of Page |


Timing the program

Program performance can be measured with the time command. However, one or more CPU-intensive processes might affect the time displayed by the time command. The time command provides the following information:

  • The elapsed, real, or "wall clock" time, which will be greater than the total charged actual CPU time.
  • Charged actual CPU time, shown for both system and user execution. The total actual CPU time is the sum of the actual user CPU time and actual system CPU time.

Using the command

The sample Fortran program program.f90 can be used to explore different performance analysing tools including the time command.

To measure the elapsed time for part of a Fortran code, use SECNDS( ). Preferable to SECNDS( ) is the F90 intrinsic SYSTEM_CLOCK that uses a real-time clock to generate wall time. Another Fortran 95 function, CPU_TIME( ), is also available to differentiate wall time from CPU time.

To see the resources used by your program, use the time command. Note that the output format depends on which shell you use. Below is a sample result on C shell:

% time a.out
9.744u 0.010s 0:09.86 98.8% 11+8k 160+5io 144pf+0w

The output shows 9.744 seconds of user time, 9.86 seconds of elapsed time, 98.8% CPU utilization, and 144 page faults.

For parallel jobs, the flag -s can be used with the prun command to get job statistics as the job exits.The result would look like the following:

Elapsed time 6.11 secs      Allocated time 24.42 secs
User time   21.59 secs      System time     0.05 secs
Status    running           Cpus used       4

Note that the allocated time is elapsed time multipled by CPUs used.

To determine the MFLOPS (million floating-point operations per second, the number of full word-size fp multiply, add, divide, and probably some compare operations that can be performed per second), issue the following commands:

% pixie -pids a.out
% a.out.pixie
% pixstats -all a.out a.out.Counts.*

for serial code. The result includes the following line:

4812000500 (0.323) flops

To obtain an accurate MFLOPS value, take the "flops" number, i.e., 4812000500, and divide it by the actual elapsed time obtained by any of the methods mentioned earlier.

| Top of Page |


Prof

The prof command analyzes one or more data files generated by the compiler's execution-profiling system and produces a listing. The prof command can also combine those data files or produce a feedback file that lets the optimizer take into account the program's run-time behavior during a subsequent compilation. Profiling is a three-step process:

  • compile the program
  • execute the program
  • run prof to analyze the data

Prof generates statistics comparing individual parts of a program. It is particularly useful in isolating the most time-consuming subroutines and loops of a program.

Using the command

The sample Fortran program program.f90 can be used to explore the capabilities of prof.

Profiling can be used to find which parts of your code take the longest time to complete, allowing you to concentrate on optimizing these parts first. To profile the code program.f90, execute the following commands:

f90 -g -o program program.f90
program < input
prof program
prof -h program

The profiling data is stored in the file mon.out, and the output of prof shows the most time-consuming routines and/or lines of code.

Some useful options are:

-heavy (list in most used order)
-lines (list in line number order)
-numbers (add line number to procedure name)
-only (list only procedures listed)
-merge filename (the sum of the profile information contained in all the specific profile files)

Prof display

Prof is used to display run output from a number of profilers such as -p, pixie, uprofile, and atom. To learn about these tools, check their man pages (-p is the Fortran compiler option).

| Top of Page |


Pixie

The instruction-counting profiler pixie can be used for optimization and coverage analysis. Pixie does the following:

  • Profiles procedures, basic blocks, source lines
  • Creates a basic block of a section of code with one entrance and one exit
  • Uses a resolution finer than procedures
  • Reports machine clock cycles
  • Creates .Addrs and .Count

Using Pixie

The commands to use pixie are:

f90 -g3 -o program program.f
pixie program
program.pixie < input
prof -pixie program

These profilers can also be used with threaded code.

Pixie and OpenMP

To use OpenMP with Pixie, compile your program with the -g1, -g2, or -g3 option:

pixie -pthread -fork -all a.out
setenv LD_LIBRARY_PATH .
setenv OMP_NUM_THREADS 4
setenv OMP_DYNAMIC FALSE
prun -n1 -c4 a.out.pixie
prof -pixie [options] a.out a.out.Counts.[thread_num]>logfile

Some options are:

-only[routine]_ (includes only this routine)
-heavy (lists most heavily used lines)
-lines (lists usage per line)
-invocations (lists calls, with caller, to each procedure)
-procedures (lists time and cycles spent on each procedure)
-test coverage (lists all lines that were compiled, not executed)

Pixie and MPI

To use MPI with Pixie, compile your program with the -g1, -g2, or -g3 option:

pixie -pids -all a.out
setenv LD_LIBRARY_PATH .
prun -n64 a.out.pixie
prof -pixie [options] a.out a.out.Counts.[pid]>logfile

| Top of Page |


VAMPIR (profiling MPI code)

VAMPIR is a powerful interactive visualization tool for analyzing MPI code performance. It interprets and visualizes a tracefile which has to be generated by the application program with the aid of an instrumented MPI library (VAMPIRtrace).

The Tutorial might be the starting point to understand this tool. Full reference can be downloaded to user's local workstation. To find out how to get a trace file, read the VT29-userguide.

Before running VAMPIR on halem, you must set up the environment variable:

setenv PAL_LICENSEFILE /usr/local/etc/vamp.dat

and compile the MPI code and link with the following libraries in the following ORDER:

-lfmpi -lVT -lpmpi -lmpi

The order is important because VAMPIR is wrapping the MPI library. You can then run your code as you would any MPI code and at the end of the run, a trace file (.bvt) will be written. If you call

% vampir traceFileName.bvt

VAMPIR provides charts of your message timings.

| Top of Page |


Ladebug

To debug Compaq Fortran programs On Tru64 UNIX, use Ladebug or dbx. Ladebug is a source-level, symbolic debugger with a choice of command line or graphical user interface that supports Compaq Fortran data types, syntax, and use. It provides extensive support for debugging programs written in C, C++, and Fortran (77 and 90) for Compaq systems.

The following commands create (compile and link) the executable program and invoke the character-cell interface to the Ladebug debugger:

% f90 -g -ladebug -o squares squares.f90
% ladebug squares

The -g(n) options control the amount of information placed in the object file for debugging. To use Ladebug, you should specify the -ladebug option along with the -g (= -g2), or -g3 options. Descriptions on how to use Ladebug can be found in the Ladebug Debugger Manual.

Ladebug examples

To invoke the debugger on an executable file:

% ladebug executable_file

To invoke the debugger on a core file:

% ladebug executable_file core_file

To invoke the debugger and attach to a running process:

% ladebug -pid process_id executable_file

To invoke the debugger and attach to a running process when you do not know what file it is executing:

% ladebug -pid process_id

To start the Ladebug GUI:

% ladebug -gui

| Top of Page |


Totalview

Totalview, a well-known debugging tool, is installed on halem for graphical debugging of serial and parallel codes. More detailed information on the use of Totalview is available at the ETNUS Totalview web site.

Note: Before using Totalview, you must make sure that X display works from halem back to your local host You must also ensure that your code is compiled with the -g option.

Debugging a sequential job

To debug a sequential job, launch Totalview at the prompt as follows:

totalview [ executable [ corefile ] ] [ options ]

where executable is your executable, corefile is the corefile (if any), and options are the arguments passed to
your executable (if any).

Debugging an OpenMP job

To debug an OpenMP job,

  1. Launch an interactive batch shell requesting 4 processors .
  2. Set the OMP_NUM_THREADS environmental variable to the number of threads you want to run on. This number should be less than or equal to 4 for halem.
  3. Launch Totalview as for a sequential job (above).

Debugging an MPI job

To debug an MPI job,

  1. Ensure that "passwordless" ssh is enabled from halem to halem by executing the following from the "halem" prompt:

    halem2> ssh-keygen -t dsa

    Hit enter for the prompts until it finishes.

    cp -p $HOME/.ssh/id_dsa.pub
    $HOME/.ssh/authorized_keys

  2. Ensure that X Display works properly as indicated above.
  3. Set the TVDSVRLAUNCHCMD environmental variable to ssh.
  4. Create a directory called .totalview in your home directory (don't forget the prefix ".").
  5. Download preferences.tvd and put it in your .totalview directory. NOTE: Do NOT edit this file.
  6. Copy from halem: " cp /opt/totalview/alpha/examples/preferences.tvd   . "

Once you have done all of the above, you must launch an interactive batch shell for the number of processors you want to use .

At the batch interactive prompt, launch totalview as follows:

totalview prun -a -n procs [ executable [ corefile ] ] [ options ]

The -a option tells totalview that the arguments that follow are to be passed to prun.

| Top of Page |


All third-party software below is installed in /usr/ulocal/stow and is named to reflect the version installed. Note that all software listed below is open source; therefore the NCCS User Services Group can assist you with using the software but cannot assume responsibility for fixing bugs.

autoconf-2.59

GNU Autoconf is used to produce shell scripts that automatically configure software source code packages. These scripts can adapt the packages to many kinds of UNIX-like systems without manual user intervention.

automake-1.8

GNU Automake is a tool for automatically generating `Makefile.in' files compliant with the GNU Coding Standards. It requires the use of GNU Autoconf.

BLACS

BLACS (Basic Linear Algebra Communication Subprograms) is a linear algebra-oriented message passing interface that can be implemented efficiently and uniformly across a large range of distributed memory platforms. For example, the ScaLAPACK library is implemented on top of BLACS.

fftw-3.0.1

FFTW ("Fastest Fourier Transform in the West") is a C subroutine library for computing the discrete Fourier transform (DFT) in one or more dimensions, of arbitrary input size, and of both real and complex data. The
User manual is available at http://www.fftw.org/fftw3_doc.

grads-1.8sl11

GrADs (Grid Analysis and Display System) is an interactive desktop tool that is used for easy access, manipulation, and visualization of earth science data. A tutorial is available at grads.iges.org/grads/gadoc/tutorial.html

hdf4.2r0

HDF (Hierarchical Data Format) is a software and file format definition for scientific data management. The HDF software includes I/O libraries and tools for analyzing, visualizing, and converting scientific data. The HDF software library provides high-level APIs and a low-level data interface. At its lowest level, HDF is a physical file format for storing scientific data. At its highest level, HDF is a collection of utilities and applications for manipulating, viewing, and analyzing data in HDF files. Documentation is available at http://hdf.ncsa.uiuc.edu/doc.html

hdf5-1.6.1 and hdf5-1.6.1-parallel

Like earlier versions of HDF, HDF5 is a general purpose library and file format for storing scientific data. The HDF5 format is not compatible with that of HDF4, but software is available for converting HDF4 data to HDF5 and vice versa. HDF5 has support for parallel I/O through MPI-IO calls. Installed Options: hdf5-1.6.1 is a sequential installation, and its use does not require linking with MPI library. hdf5-1.6.1-parallel, however, is a parallel installation and its use requires linking with the MPI-IO library; i.e., -lmpi -lmpio on halem. Documentation is available at hdf.ncsa.uiuc.edu/HDF5/doc, and a tutorial is available at hdf.ncsa.uiuc.edu/tutorial4.html

jpeg-6b

JPEG (Joint Photographic Experts Group) is a free library (libjpeg) for manipulating JPEG images. This library is very widely used by other packages and libraries and is the standard way of manipulating JPEG images. A manual is available at www.ijg.org/files, and sample tutorials are available at stargate.ecn.purdue.edu/~ips/tutorials/jpeg/jpegtut1.html and www.ece.purdue.edu/~ace/jpeg-tut/jpegtut1.html

ncarg-4.3.1

NCAR Graphics is a library containing Fortran/C utilities for drawing contours, maps, vectors, streamlines, weather maps, surfaces, histograms, X/Y plots, annotations, etc. It also contains C and Fortran interpolators and approximators for 1-, 2-, and 3-D data. Other helpful links:
Getting started: ngwww.ucar.edu/ng/getstarted.html
Documentation: ngwww.ucar.edu/ng/documentation.html
Examples: ngwww.ucar.edu/ng/examples.html

| Top of Page |

nco-2.9.5

NCO (NetCDF Operators) is a suite of programs or operators that perform a set of operations on a netCDF or HDF4 file to produce a new netCDF file as output.

nedit-5.4

NEdit is a multi-purpose text editor for the X Window System.

netcdf-3.5.1

NetCDF (network Common Data Form) is an interface and a library for representing array-oriented scientific data in a machine-independent format.

Numeric-23.1

Numerical Python adds a fast array facility to the Python language.

Pyfort-8.5

Pyfort (Fortran for Python) is a tool for creating extensions to the Python language and Numerical Python using Fortran routines. A tutorial is avilable at pyfortran.sourceforge.net/pyfort/pyfort_reference.htm

Python-2.3.4

Python is an interpreted object-oriented programming language suitable for distributed application development, scripting, numeric computing and system testing. A tutorial is available at docs.python.org/tut/tut.html.

SCALAPACK

The ScaLAPACK (Scalable LAPACK) library includes a subset of LAPACK routines redesigned for distributed
memory MIMD parallel computers. A Users Guide is available at www.netlib.org/scalapack/slug/index.html.

tau-2.13.5

TAU (Tuning and Analysis Utilities) is a program and performance analysis tool framework for high-performance parallel and distributed computing.

udunits-1.12.1

Udunits (Unidata units library) is a library for manipulating units of physical quantities. It supports conversion of unit specifications between formatted and binary forms, arithmetic manipulation of unit specifications, and conversion of values between compatible scales of measurement.

zlib-1.1.4

Zlib is a general purpose data compression library that provides in-memory compression and decompression functions, including integrity checks of the uncompressed data. This version of the library supports only one compression method (deflation) but other algorithms will be added later and will have the same stream interface. A manual is available at www.gzip.org/zlib/zlib_docs.html, and a tutorial is available at www.php.net/manual/en/ref.zlib.php.

| Top of Page |



FirstGov logo + Privacy Policy and Important Notices
+ Sciences and Exploration Directorate
+ CISTO
NASA Curator: Mason Chang,
NCCS User Services Group (301-286-9120)
NASA Official: Phil Webster, High-Performance
Computing Lead, GSFC Code 606.2