If you are using
Navigator 4.x
or
Internet Explorer 4.x
or
Omni Web 4.x
, this site will not render
correctly!
gfdl's home page > people > John Dunne >
Totalview Guide
This document describes how to get started using
the Totalview Debugger. This is an object-oriented program that
allows you to explore the model code with full access to the
constants and variables within it as it runs.
Altering the runscript
There is one essential edit to be made in the runscript in wrap
the run command in a totalview execution
and automatically exiting rather than
outputing the executable to and then doing
post-processing. Replace:
mpirun -np $npes $executable:t > fms.out
with:
totalview mpirun -a -np $npes $executable:t
exit
Also, make sure that you are running with MPI as opposed to SHMEM.
mpirun -np $npes $executable:t > fms.out
with:
totalview mpirun -a -np $npes $executable:t
exit
Also, make sure that you are running with MPI as opposed to SHMEM.
Re-compiling the executable
The executable will have to be edited and recompiled with the debugging
options turned on and optimization turned off (Note: this will
slow down the code by a factor of 5). The easiest way to
do this is to edit the Makefile (which resides in the
exec directory) to point to a mkmfTemplate
that allows command line options. This can be done by changing
the line:
include /home/fms/bin/mkmf.template.sgi
to:
include /home/jpd/jakarta/mkmf.template.sgi
When recompiling the executable, be sure that all of the intermediate ".o" files are removed so that the Makefile will re-create them with the correct compilation options and that all of the fortran code is copied over to the directory in which the Makefile resides so that totalview will be able to find the source code. This is achieved by changing to the Makefile directory and running the commands:
include /home/fms/bin/mkmf.template.sgi
to:
include /home/jpd/jakarta/mkmf.template.sgi
When recompiling the executable, be sure that all of the intermediate ".o" files are removed so that the Makefile will re-create them with the correct compilation options and that all of the fortran code is copied over to the directory in which the Makefile resides so that totalview will be able to find the source code. This is achieved by changing to the Makefile directory and running the commands:
make clean
make localize
gmake DEBUG=1
Running totalview
To start totalview, follow the following steps:
-
Log in for an interactive session on the AC.
-
If they exist, remove and the initialized file in
the model output directory.
-
Change to the directory with the runscript.
-
Execute the runscript. This will start the totalview program.
-
Begin execution by switching to the larger, main program
window as the active one, and either hitting "G" on the keyboard,
or selecting the corresponding menu option by holding down the
middle mouse button and choosing the "Go/Halt/Next/Step/Hold" --> "Go Group" option.
Totalview will then begin execution of the runscript. When it gets to the command to run the executable (which requires multiple processors), it will ask if you wish to halt the process.
-
Clicking "NO" will let the program run until it either crashes (where
it will stop, offering you a traceback of the hierarchy of lines of model
source code and active constants and variables at that point) or
run to completion (where it will automatically exit).
-
The Traceback hierarchy is in the upper, left part of the main
window. This will allow you to toggle through the heirarchy of
subroutine calls within the code.
-
The current, active constants and variables are provided in the
upper, right part of the main window. You can click on these
parameters both in this list or directly in the code.
-
Clicking "YES" will halt the program and allow you to open up source
files and place "STOP" commands in the code, before running the
code with "G". Add a stop by clicking the left mouse button on the code line
number on the left of the screen. Remove them by clicking on them again.
-
Sometimes the code will stop before it gets to the stop. Frankly,
I haven't figured out why this happens, other than the program is
timing out as some processors jump ahead of others. When this
happens, just resume execution with "G".