halem Software
and Tools
OpenMP
OpenMP is a standard Application Program Interface
(API) for shared-memory parallel
programming in Fortran, C, and
C++. Directives must be added to
source code to parallelize loops
and specify certain properties
of variables.
Fortran with OpenMP directives is compiled
as:
% f90 -fast -omp myprog.f
-o myprog.exe
C
code with OpenMP directives is
compiled as:
% cc
-fast -omp myprog.c
-o myprog.exe
You can run OpenMP two ways:
batch interactive or batch.
Batch Interactive. Batch
interactive gives you an interactive
batch session for development
and debugging.
At the interactive prompt, you
can simply run your OpenMP executable
as:
% ./myprog.exe
By default, the system will run your OpenMP
jobs using four threads. If you
want to run with fewer than four
threads,
set
the OMP_NUM_THREADS environmental
variable accordingly. For example,
if you want to run using two threads,
specify the following:
% env
OMP_NUM_THREADS=2 ./myprog.exe
Since there are only four processors per node,
we strongly advise that you limit
the number of threads to four.
Batch. The second way to
run OpenMP is to submit it
as a batch job. You can either
submit the program to run or you
can wrap it in a script and submit
the script. To submit the program
to run on four processors running
the default four threads, specify
the folloiwng:
% bsub -q general
-n 4 -Pt000 ./myprog.exe
In the above example and subsequent examples,
don't forget to replace t000 with
your actual Computational Project.
You may use the command getsponsor
to retrieve your Computational
Project.
If you want to run two threads, specify the
following:
% bsub -q general -n 4 -Pt000 env
OMP_NUM_THREADS=2 ./myprog.exe
Please note that the minimum number of processors
you can ask for is four; that is,
one node, even though you may be
running fewer than four threads.
Before using a batch script,
you must write and submit the script.
An example of a script that will
achieve the same effect as the
example above is:
#-----script begins-------
#!/bin/csh
#BSUB -n 4
#BSUB -q general
#BSUB -P t000
#BSUB -o myprog.out
#BSUB -e myprog.err
env OMP_NUM_THREADS=2 ./myprog.exe
#-----script ends-------
Assuming you name this script myscript, you
can then submit your job by issuing
the following command:
bsub < myscript
Please note that the "<" character
is necesary.
Timing issues
Many timers accumulate time from daughter
processes which makes -omp _appear_
to slow the code down or make no
improvement, e.g., the unix "time" command
and the f95 CPU_TIME intrinsic
both do this.
Common problems
One of the most common problems encountered
after parallelizing a code is the
generation of floating point exceptions
or segmentation violations that
did not occur before. In OpenMP,
these errors may be caused by uninitialized
variables. The reason that uninitialized
variables sometimes don't show
up in sequential code is because
on halem, by default, non-OpenMP
codes are compiled with the -static
compiler option, which potentially
sets all unitialized variables
initially to zero and causes all
local variables to be statically
allocated. For OpenMP codes, however,
the default is -automatic.
This implies that local variables
are placed on the run stack and
no assumtption is made about their
values. Because the compiler flag
-warn uninitialized is on by default
on halem, the compiler should warn
about variables that are being
used before values are assigned
to them. Even if you are not using
OpenMP, it is good practice not
to rely on the default -static
behavior but rather to explicitly
make sure that variables are assigned values
before they are used, even if those
values are zero.
| Top of Page |
MPI
MPI is a standard specification for message-passing
libraries, created by the Message Passing Interface
Forum (MPIF). A set of hands-on tutorials are
avaiilable for you to familiarize yourself
with the use of MPI:
The MPI version on halem is fully
compliant with the MPI-1
standard and it incorporates some
of the capabilities
of MPI-2.
Compiling MPI programs
on halem
To compile MPI programs, you must link
to the MPI library:
For FORTRAN: f90 -o mpicode mpicode.f -lmpi
For C and C++: cc -o mpicode mpicode.c -lmpi
Your MPI program should always include the appropriate
header file:
For FORTRAN: include "mpif.h"
For C and C++: include "mpi.h"
NOTE: You need not and
should not have a copy of mpif.h
or mpi.h in your build or run
directory, as this may cause errors
in your program. Your program will
automatically access the correct
halem-specific copy of the
mpi header file provided by the
system.
Running MPI programs on halem
MPI jobs are run on batch mode on halem. The
command to launch an MPI job on
halem is prun, which should be
included in your batch script. The prun command
provides you with more options
to control the processor and node
allocation of your job, as illustrated
in Example 3 below. Refer
to the prun man page for more information.
It is important to note that the
batch submission bsub command
has an argument for the number
of requested processors (-n),
while the prun command has the
same (-n) argument to specify the
number of processes to
be launched---they are not the
same. Both should be specified.
The number of processors requested
from LSF in your bsub command
should be a multiple of 4,
as you will always be assigned
full nodes and all their memory.
The number of parallel processes
launched by prun can be any
number, but it must be smaller
or equal to the number of processors
requested from LSF in the bsub
command.
See examples 1 and 2 below
for illustration.
Example 1: Your job "mpicode" requires
8 MPI processes to run on 8
processors. Your script to
run "mpicode" can
include the following two lines:
#BSUB -n8
.
.
prun -n8 ./mpicode
Example 2: Your job "mpicode" requires
10 MPI processes to run. You will
need to request 3 full nodes, or
12 processors, from LSF to run
your job:
#BSUB 12
.
.
prun -n10 ./mpicode
Example 3: Your job "mpi_OMP_code is
actually a "hybrid" MPI/OpenMP code,
and each of the 8 MPI processes
will launch 4 OpenMP threads. You will likely
want to request 32 processors from
LSF (#BSUB -n 4) and have prun
launch 8 MPI processes (-n 8) to be allocated
on 8 different nodes (-N 8) with 4 cpus on
each node (-C4)
#BSUB -n32
.
.
prun -n8 -N8 -C4 ./mpi_OMP_code
While debugging small MPI jobs, you might
want to use the interactive batch
queue.
Some simple examples
Fortran example
The following FORTRAN program uses message
passing software:
C -----------------------------------------------------------------
C senddata --- example MPI Fortran
program
C
C This program will demonstrate
point-to-point communication
C between 2 processors. Processor
0 will send an arbitrary integer
C value to processor 1. In this
example, the value of 10 will be sent.
C Processor 1 will receive this
message and write
C to standard output the contents
of the message. In addition,
C the contents of the status array
will be output.
C
C
C -----------------------------------------------------------------
Program senddata
include 'mpif.h'
integer myproc, size
integer din, dout
integer next, prev, tag
integer req
integer i, errcode
integer status(MPI_STATUS_SIZE)
print *, 'Alive'
c
c Initialize MPI environment.
c
call MPI_INIT(errcode)
c
c Determine my processor id.
c
call MPI_COMM_RANK(MPI_COMM_WORLD, myproc, errcode)
c
c Determine number of processors allocated.
c
call MPI_COMM_SIZE(MPI_COMM_WORLD, size, errcode)
print *, '[',myproc,'] Total # of processors used
in this example
* = ',size
if (size .ne. 2) go to 20
c
c Set temporary value for dout
on each processor.
c
dout = 0
c
c Set destination id. This will be the number of
the processor where I will
c send data.
c
next = 1
c
c Set source id. This will be the number of the processor
from which
c I will receive data.
c
prev = 0
c
tag = 0
c
c Specify new value for dout on
processor 0.
c
if (myproc .eq. 0) dout = 10
print *, '[',myproc,'] Initial val = ', dout
if (myproc .eq. 0) then
c
c Send contents of dout to next
processor. The variable dout has
c been set to the value 10. The
'next' processor in this
c example will be processor 1,
as calculated above.
c
call MPI_SEND(dout, 1, MPI_INTEGER, next, tag,
+ MPI_COMM_WORLD, errcode)
if (errcode .ne. MPI_SUCCESS) then
write(*,*) 'There was a problem with the send.'
endif
else
c
c Receive message into din from prev processor. The
'prev' processor
c in this example will be processor 0.
c
call MPI_RECV(din, 1, MPI_INTEGER, prev, tag,
+ MPI_COMM_WORLD, status, errcode)
if (errcode .ne. MPI_SUCCESS) then
write(*,*) 'There was a problem with the receive.'
endif
print *, '[',myproc,'] Received ', 'val = ',din
print *, '[',myproc,'] Source of message = ',status(MPI_SOURCE)
print *, '[',myproc,'] Tag = ',status(MPI_TAG)
end if
20 print *, '[',myproc,'] Complete'
c
c Terminate MPI.
c
call MPI_FINALIZE(errcode)
end
C example
/* greetings.c -- greetings program
*
* Send a message from all processes with rank !=
0 to process 0.
* Process 0 prints the messages received.
*
* Input: none.
* Output: contents of messages received by process
0.
*
* See Chapter 3, pp. 41 & ff in PPMPI.
*/
#include "stdio.h"
#include "string.h"
#include "mpi.h"
main(int argc, char* argv[]) {
int my_rank; /* rank of process */
int np; /* number of processes */
int source; /* rank of sender */
int dest; /* rank of receiver */
int tag = 0; /* tag for messages */
char message[100]; /* storage for message */
MPI_Status status; /* return status for */
/* receive */
/* Start up MPI */
MPI_Init(&argc, &argv);
/* Find out process rank */
MPI_Comm_rank(MPI_COMM_WORLD, &my_rank);
/* Find out number of processes */
MPI_Comm_size(MPI_COMM_WORLD, &np);
if (my_rank != 0) {
/* Create message */
sprintf(message, "Greetings from process %d!",
my_rank);
dest = 0;
/* Use strlen+1 so that '\0' gets transmitted */
MPI_Send(message, strlen(message)+1, MPI_CHAR,
dest, tag, MPI_COMM_WORLD);
} else { /* my_rank == 0 */
for (source = 1; source != np; source++) {
MPI_Recv(message, 100, MPI_CHAR, source, tag,
MPI_COMM_WORLD, &status);
printf("%s\n", message);
}
}
/* Shut down MPI */
MPI_Finalize();
} /* main */
Commands to run the example programs
To run the above program on halem, use the
following commands:
# Compile and load program
f90 program.f
-lmpi (Fortran)
cc program.c
-lmpi (C)
# Run executable (in interactive mode)
bsub -P [sponsor_code]
-I -q [batch-queue]
-n 12 prun -n 10 ./a.out
Additional information
For more information on MPI, please see:
- The man pages on MPI and MPI routines on
halem. It should be noted that
the man pages expect precise capitalization.
For example, MPI_Comm_rank works, but mpi_comm_rank
and other variations do not.
- The
Message Passing Interface (MPI) standard. This
page links to multiple sources
of message passing and parallel
programming information.
| Top
of Page |
MPI-IO
The best starting point for learning MPI-IO
is the Official
MPI-2 Standard documents. MPI-IO hints
that are specifically supported
in the HP AlphaServer SC PFS file
system are in the Quadrics
HP AlphaServer SC User Guide (Click MPI-IO
Support under Appendix C on this page).
To write an MPI-IO Fortran program,
insert the following MPI header
file in the source file:
INCLUDE "mpif.h"
and then compile the code with
MPI and MPI-IO libraries:
% f90 -o mpi_iocode mpi_iocode.f90
-lmpi -lmpio
The file mpif.h has definitions
for Fortran MPI-IO programs.
A short sample
MPI-IO test program can be downloaded
and tested on halem.
| Top
of Page |
SHMEM
To build Fortran SHMEM programs, insert the
SHMEM header file in your source
files:
INCLUDE 'shmemf.h'
and compile the program with f90 and link
with the SHMEM library:
% f90 -o shmemcode shmemcode.f90 -lshmem
| Top
of Page |
SMS
SMS (Scalable
Modeling System, sms-2.8.0.r4i4
and sms-2.8.0.r8i8) is a directive-based
parallelization tool that translates
Fortran code into a parallel version
that runs efficiently on both shared and distributed
memory systems including the IBM
SP2, Cray T3E, SGI Origin, Sun Clusters, Alpha
Linux clusters and Intel Clusters. The r4i4
package option was installed to run code compiled
with the "-r4 -i4" option.
The r8i8 option was installed to
run code compiled with the "-r8 -i8" option.
A tutorial is available at www-ad.fsl.noaa.gov/ac/SMS_UsersGuide_v2.8.pdf.
To use Scalable Modeling
System (SMS) libraries,
first set the environment variable
SMS to the installation directory.
On halem, this directory is
/usr/ulocal/sms-2.7.0.r8i8 for the code
that is to be compiled with real*8, integer*8
option. Other libraries for real*8/integer*4,
and real*4/integer*4 cases
are in /usr/ulocal/sms-2.7.0.r8i4,
and /usr/ulocal/sms-2.7.0.r4i4,
respectively. Next add $SMS/bin
to your path. The bin directory
includes the scripts mpirun,
mpif90, and mpicc that SMS
scripts might call to run mpi executables
and compile mpi codes. If the
SMS script uses bourne shell,
the above should be set in
the file .profile. If the SMS script
uses C shell, set the above
in .cshrc.
The file .profile in your home directory
would include:
SMS="/usr/ulocal/sms-2.7.0.r8i8"
export $SMS
echo $PATH | /bin/grep
-q "$HOME/bin" ||
{
PATH=$SMS/bin:$HOME/bin:${PATH:-/usr/bin:.}
export PATH
}
The file .cshrc in your home directory
would include:
setenv SMS /usr/ulocal/sms-2.7.0.r8i8
setenv PATH ${PATH}:$SMS/bin:.
SMS
is an "unsupported" application,
meaning that NCCS staff will
only install the software
and test it with sample code
for validation. All SMS-related
questions should be directed
to the NOAA Forecast Systems
Laboratory.
| Top
of Page |
Compiler listing
To obtain a compiler listing, compile the
code using the command:
%f90 -V annotations all code.f90
The "-V
annotations all" options
cause the compiler to create a
listing file with annotations describing what
optimizations the compiler has made.
The other common UNIX profilers are the command
line options -p and -pg. These
command line options instrument the code.
% f90 -p -g3 code.f90
% f90 -pg -g3 code.f90
% a.out
| Top
of Page |
Timing the program
Program performance can be measured with the
time command. However, one or more
CPU-intensive processes might affect
the time displayed by the time command. The
time command provides the following information:
- The
elapsed, real, or "wall clock" time,
which will be greater than
the total charged actual
CPU time.
- Charged actual CPU
time, shown for both system
and user execution. The total
actual CPU time is the sum of
the actual user CPU time and
actual system CPU time.
Using the command
The sample
Fortran program program.f90 can
be used to explore different performance
analysing tools including the time
command.
To measure the
elapsed time for part of a
Fortran code, use SECNDS(
).
Preferable to SECNDS( ) is the
F90 intrinsic SYSTEM_CLOCK that
uses a real-time clock to generate
wall time. Another Fortran 95
function, CPU_TIME(
), is also
available to differentiate
wall time from CPU time.
To see the resources used by your program,
use the time command. Note that
the output format depends on which
shell you use. Below is a sample
result on C shell:
% time a.out
9.744u 0.010s 0:09.86 98.8% 11+8k 160+5io 144pf+0w
The output
shows 9.744 seconds of user time,
9.86 seconds of elapsed time,
98.8% CPU utilization, and 144
page faults.
For parallel jobs,
the flag -s can be used with
the prun command to get job statistics
as the job exits.The result would
look like the following:
Elapsed time 6.11 secs Allocated time 24.42 secs
User time 21.59
secs System time 0.05 secs
Status running
Cpus
used 4
Note
that the allocated time is
elapsed time multipled by CPUs
used.
To determine the MFLOPS (million floating-point
operations per second, the number
of full word-size fp multiply,
add, divide, and probably some
compare operations that can be
performed per second), issue the
following commands:
% pixie -pids a.out
% a.out.pixie
% pixstats -all a.out a.out.Counts.*
for
serial code. The result includes
the following line:
4812000500 (0.323) flops
To obtain an accurate MFLOPS value, take
the "flops" number, i.e., 4812000500,
and divide it by the actual elapsed
time obtained by any of the methods
mentioned earlier.
| Top
of Page |
Prof
The prof command analyzes one or more data
files generated by the compiler's
execution-profiling system and
produces a listing. The prof command
can also combine those data files
or produce a feedback file that lets the optimizer
take into account the program's run-time behavior
during a subsequent compilation.
Profiling is a three-step process:
- compile the
program
- execute the program
- run prof to analyze
the data
Prof generates statistics
comparing individual parts
of a program. It is particularly
useful in isolating the most
time-consuming subroutines
and loops of a program. Using the command
The sample
Fortran program program.f90 can be used to explore the capabilities
of prof.
Profiling can be used to find which parts
of your code take the longest time
to complete, allowing you to
concentrate on optimizing these
parts first. To profile the code
program.f90, execute the following
commands:
f90 -g -o program program.f90
program < input
prof program
prof -h program
The profiling data is stored in the file mon.out,
and the output of prof shows the
most time-consuming routines and/or lines of
code.
Some useful options are:
-heavy (list in most used order)
-lines (list in line number order)
-numbers (add line number to
procedure name)
-only (list only procedures listed)
-merge filename (the sum of the
profile information contained in
all the specific profile files)
Prof display
Prof is used to display run output from a
number of profilers such as -p,
pixie, uprofile, and atom. To learn
about these tools, check their
man pages (-p is the Fortran compiler
option).
| Top
of Page |
Pixie
The instruction-counting profiler pixie can
be used for optimization and coverage
analysis. Pixie does the following:
- Profiles
procedures, basic blocks,
source lines
- Creates a basic block of a section
of code with one entrance and
one exit
- Uses a resolution finer than
procedures
- Reports machine clock
cycles
- Creates .Addrs and .Count
Using
Pixie
The commands to use pixie are:
f90 -g3 -o program program.f
pixie program
program.pixie < input
prof -pixie program
These profilers can also be used
with threaded code.
Pixie and OpenMP
To use OpenMP with Pixie, compile your program
with the -g1, -g2, or -g3 option:
pixie -pthread -fork -all a.out
setenv LD_LIBRARY_PATH .
setenv OMP_NUM_THREADS 4
setenv OMP_DYNAMIC FALSE
prun -n1 -c4 a.out.pixie
prof -pixie [options] a.out a.out.Counts.[thread_num]>logfile
Some options are:
-only[routine]_ (includes
only this routine)
-heavy (lists most heavily used
lines)
-lines (lists usage per line)
-invocations (lists calls, with
caller, to each procedure)
-procedures (lists time and cycles
spent on each procedure)
-test coverage (lists all lines
that were compiled, not executed)
Pixie and MPI
To use MPI with Pixie, compile your program
with the -g1, -g2, or -g3 option:
pixie -pids -all a.out
setenv LD_LIBRARY_PATH .
prun -n64 a.out.pixie
prof -pixie [options] a.out a.out.Counts.[pid]>logfile
| Top
of Page |
VAMPIR
(profiling MPI code)
VAMPIR is a powerful interactive visualization
tool for analyzing MPI code performance.
It interprets and visualizes
a tracefile which has to be generated
by the application program with
the aid of an instrumented MPI
library (VAMPIRtrace).
The Tutorial might be
the starting point to understand
this tool. Full reference can be downloaded
to user's local workstation.
To find out how to get a trace file,
read the VT29-userguide.
Before running
VAMPIR on halem, you must
set up the environment
variable:
setenv PAL_LICENSEFILE /usr/local/etc/vamp.dat
and
compile the MPI code and link
with the following libraries in the following
ORDER:
-lfmpi -lVT -lpmpi -lmpi
The order is important
because VAMPIR is wrapping the
MPI library. You can then run your code as
you would any MPI code and at the end of the
run, a trace file (.bvt) will be written. If
you call
% vampir traceFileName.bvt
VAMPIR provides
charts of your message timings.
| Top
of Page |
Ladebug
To debug Compaq Fortran programs On Tru64
UNIX, use Ladebug or dbx. Ladebug
is a source-level, symbolic debugger
with a choice of command line or
graphical user interface that supports
Compaq Fortran data types, syntax, and use.
It provides extensive support for debugging
programs written in C, C++, and
Fortran (77 and 90) for Compaq
systems.
The
following commands create (compile
and link) the executable program and invoke
the character-cell interface to the Ladebug
debugger:
% f90 -g -ladebug -o squares squares.f90
% ladebug squares
The -g(n) options control the amount of information
placed in the object file for debugging. To use
Ladebug, you should specify the
-ladebug option along with the
-g (= -g2), or -g3 options. Descriptions
on how to use Ladebug can be found
in the Ladebug
Debugger Manual.
Ladebug examples
To invoke the debugger on an executable file:
% ladebug executable_file
To invoke the debugger
on a core file:
% ladebug executable_file
core_file
To invoke the debugger
and attach to a running process:
% ladebug -pid process_id
executable_file
To invoke the debugger
and attach to a running
process when you do not
know what file it is executing:
% ladebug
-pid process_id
To start
the Ladebug GUI:
% ladebug -gui
| Top
of Page |
Totalview
Totalview, a well-known debugging tool, is
installed on halem for graphical
debugging of serial and parallel
codes. More detailed information
on the use of Totalview is available at the ETNUS
Totalview web site.
Note: Before using Totalview, you must make
sure that X display works from
halem back to your local host You must also ensure that
your code is compiled with the
-g option.
Debugging a sequential job
To debug a sequential job, launch Totalview
at the prompt as follows:
totalview [ executable [ corefile ] ] [ options
]
where executable is your executable, corefile
is the
corefile (if any), and options
are the arguments passed to
your executable (if any).
Debugging an OpenMP job
To debug an OpenMP job,
- Launch an interactive batch shell requesting
4 processors .
- Set
the OMP_NUM_THREADS environmental
variable to the number of threads
you want to run on. This number
should be less than or equal
to 4 for halem.
- Launch Totalview
as for a sequential job (above).
Debugging an MPI job
To debug an MPI job,
- Ensure that "passwordless" ssh
is enabled from halem to halem
by executing the following from
the "halem" prompt:
halem2> ssh-keygen
-t dsa
Hit enter for the prompts
until it finishes.
cp -p $HOME/.ssh/id_dsa.pub
$HOME/.ssh/authorized_keys
- Ensure
that X Display works properly
as indicated above.
- Set the
TVDSVRLAUNCHCMD environmental
variable to ssh.
- Create a
directory called .totalview
in your home directory (don't
forget the prefix ".").
- Download
preferences.tvd and put it
in your .totalview directory. NOTE:
Do NOT edit this file.
-
Copy from halem: " cp /opt/totalview/alpha/examples/preferences.tvd   . "
Once you have done all of the above, you must
launch an interactive batch shell for the number
of processors you want to use .
At the batch interactive prompt, launch totalview
as follows:
totalview prun -a
-n procs [ executable [
corefile ] ] [ options ]
The -a option tells totalview that the arguments
that follow are to be passed to
prun.
| Top
of Page |
All third-party software
below is installed in /usr/ulocal/stow
and is named to reflect the
version installed. Note that all
software listed below is open source;
therefore the NCCS User Services
Group can assist you with using
the software but cannot assume
responsibility for fixing bugs.
autoconf-2.59
GNU
Autoconf is used to produce shell scripts
that automatically configure
software source code packages.
These scripts can adapt the packages
to many kinds of UNIX-like systems
without manual user intervention.
automake-1.8
GNU
Automake is
a tool for automatically generating
`Makefile.in' files compliant with
the GNU Coding Standards. It requires
the use of GNU Autoconf.
BLACS
BLACS (Basic
Linear Algebra Communication Subprograms) is
a linear algebra-oriented message passing
interface that can be implemented
efficiently and uniformly across
a large range of distributed memory
platforms. For example, the ScaLAPACK
library is implemented on top of
BLACS.
fftw-3.0.1
FFTW ("Fastest
Fourier Transform in the West") is
a C subroutine library for computing
the discrete Fourier transform
(DFT) in one or more dimensions,
of arbitrary input size, and of
both real and complex data. The
User manual is available at http://www.fftw.org/fftw3_doc.
grads-1.8sl11
GrADs (Grid
Analysis and Display System) is
an interactive desktop tool that
is used for easy access, manipulation,
and visualization of earth science
data. A tutorial is available at grads.iges.org/grads/gadoc/tutorial.html
hdf4.2r0
HDF (Hierarchical
Data Format) is a software and
file format definition for scientific
data management. The HDF software
includes I/O libraries and tools
for analyzing, visualizing, and
converting scientific data. The
HDF software library provides high-level
APIs and a low-level data interface.
At its lowest level, HDF is a physical
file format for storing scientific
data. At its highest level, HDF
is a collection of utilities and
applications for manipulating,
viewing, and analyzing data in
HDF files.
Documentation is available at http://hdf.ncsa.uiuc.edu/doc.html
hdf5-1.6.1 and hdf5-1.6.1-parallel
Like earlier
versions of HDF, HDF5 is
a general purpose library and
file format for storing scientific
data. The HDF5 format is not
compatible with that of HDF4,
but software is available for
converting HDF4 data to HDF5
and vice versa. HDF5 has support
for parallel I/O through MPI-IO
calls. Installed Options: hdf5-1.6.1
is a sequential installation,
and its use does not require
linking with MPI library. hdf5-1.6.1-parallel,
however, is a parallel installation
and its use requires linking
with the MPI-IO library; i.e.,
-lmpi -lmpio on halem. Documentation is available
at hdf.ncsa.uiuc.edu/HDF5/doc, and a tutorial
is available at hdf.ncsa.uiuc.edu/tutorial4.html
jpeg-6b
JPEG (Joint
Photographic Experts Group)
is a free library (libjpeg) for
manipulating JPEG images. This
library is very widely used by
other packages and libraries and
is the standard way of manipulating
JPEG images. A manual is available
at www.ijg.org/files, and sample
tutorials are available at stargate.ecn.purdue.edu/~ips/tutorials/jpeg/jpegtut1.html and
www.ece.purdue.edu/~ace/jpeg-tut/jpegtut1.html
ncarg-4.3.1
NCAR
Graphics is a library
containing Fortran/C utilities
for drawing contours, maps, vectors,
streamlines, weather maps, surfaces,
histograms, X/Y plots, annotations,
etc. It also contains C and Fortran
interpolators and approximators
for 1-, 2-, and 3-D data. Other helpful links:
Getting started: ngwww.ucar.edu/ng/getstarted.html
Documentation: ngwww.ucar.edu/ng/documentation.html
Examples: ngwww.ucar.edu/ng/examples.html
| Top
of Page |
nco-2.9.5
NCO (NetCDF
Operators) is a suite of programs
or operators that perform a set
of operations on a netCDF or HDF4
file to produce a new netCDF file
as output.
nedit-5.4
NEdit is
a multi-purpose text editor for
the X Window
System.
netcdf-3.5.1
NetCDF (network
Common Data Form)
is an interface and a library for
representing
array-oriented scientific data
in a machine-independent
format.
Numeric-23.1
Numerical
Python adds a fast array
facility to the Python language.
Pyfort-8.5
Pyfort (Fortran
for Python) is
a tool for creating extensions
to the Python language and Numerical
Python using Fortran routines.
A tutorial is avilable at pyfortran.sourceforge.net/pyfort/pyfort_reference.htm
Python-2.3.4
Python is
an interpreted object-oriented
programming language suitable for
distributed application development,
scripting, numeric computing and
system testing.
A tutorial is available at docs.python.org/tut/tut.html.
SCALAPACK
The ScaLAPACK (Scalable LAPACK) library
includes a
subset of LAPACK routines redesigned
for distributed
memory MIMD parallel computers. A
Users Guide is available at www.netlib.org/scalapack/slug/index.html.
tau-2.13.5
TAU (Tuning
and Analysis Utilities) is
a program and performance analysis
tool framework for high-performance
parallel and distributed computing.
udunits-1.12.1
Udunits (Unidata
units library) is
a library for manipulating units
of physical quantities. It supports
conversion of unit specifications
between formatted and binary forms,
arithmetic manipulation of unit
specifications, and conversion
of values between compatible scales
of measurement.
zlib-1.1.4
Zlib is
a general purpose data compression
library that provides in-memory
compression and decompression functions,
including integrity checks of the
uncompressed data. This version
of the library supports only one
compression method (deflation)
but other algorithms will be added
later and will have the same stream
interface. A manual is available
at www.gzip.org/zlib/zlib_docs.html,
and a tutorial is available at www.php.net/manual/en/ref.zlib.php.
| Top
of Page |
|