Biowulf at the NIH
Tree-Puzzle on Biowulf

ppuzzle TREE-PUZZLE is a computer program to reconstruct phylogenetic trees from molecular sequence data by maximum likelihood. It implements a fast tree search algorithm, quartet puzzling, that allows analysis of large data sets and automatically assigns estimations of support to each internal branch. TREE-PUZZLE also computes pairwise maximum likelihood distances as well as branch lengths for user specified trees. Branch lengths can also be calculated under the clock-assumption. In addition, TREE-PUZZLE offers likelihood mapping, a method to investigate the support of a hypothesized internal branch without computing an overall tree and to visualize the phylogenetic content of a sequence alignment. TREE-PUZZLE also conducts a number of statistical tests on the data set (chi-square test for homogeneity of base composition, likelihood ratio to test the clock hypothesis, Kishino-Hasegawa test). The models of substitution provided by TREE-PUZZLE are TN, HKY, F84, SH for nucleotides, Dayhoff, JTT, mtREV24, BLOSUM 62, VT, WAG for amino acids, and F81 for two-state data. Rate heterogeneity is modeled by a discrete Gamma distribution and by allowing invariable sites. The corresponding parameters can be inferred from the data set.

Tree-Puzzle Documentation
Tree-Puzzle website

Tree-Puzzle on Biowulf has been built with MPI for parallel runs. To submit a job on Biowulf, create a command file similar to the following:

-------------------Sample command file for Tree-Puzzle-----------------------
#!/bin/csh
#PBS -N Ppuzzle
#PBS -m be
#PBS -k oe

set path = (/usr/local/mpich/bin $path)

cd /data/username/tree/

mpirun -machinefile $PBS_NODEFILE -np $np /usr/local/bin/ppuzzle << EOF
primates.b
y
EOF

-----------------------------------------------------------------------------
where primates.b is the input file for puzzle. See the Tree-Puzzle documentation for a list of all available parameters.

Submit this job using the qsub command, e.g:

qsub -v np=4 -l nodes=2 command-file
where 'command-file' is the file you created above.
Tree-Puzzle Options

Tree-Puzzle has many options. A summary is below:


GENERAL OPTIONS
 b                   Type of analysis?    Tree reconstruction
 k                Tree search procedure?  Quartet puzzling
 v       Approximate quartet likelihood?  No
 u             List unresolved quartets?  No
 n             Number of puzzling steps?  1000
 j             List puzzling step trees?  No
 o                  Display as outgroup?  Gibbon
 z     Compute clocklike branch lengths?  No
 e                  Parameter estimates?  Approximate (faster)
 x            Parameter estimation uses?  Neighbor-joining tree
SUBSTITUTION PROCESS
 d          Type of sequence input data?  Nucleotides
 m                Model of substitution?  HKY (Hasegawa et al. 1985)
 t    Transition/transversion parameter?  Estimate from data set
 f               Nucleotide frequencies?  Estimate from data set
RATE HETEROGENEITY
 w          Model of rate heterogeneity?  Uniform rate
Details about all options are available in the Tree-Puzzle documentation. Options are specified in the command file by simply entering the interactive menu options and values as needed. For example, to change the number of puzzling steps in your run to 8000, the command file would look like
--------------------------------------------------------
#!/bin/csh
#PBS -N Ppuzzle
#PBS -m be
#PBS -k oe

set path = (/usr/local/mpich/bin $path)

cd /data/username/tree/

mpirun -machinefile $PBS_NODEFILE -np $np /usr/local/bin/ppuzzle << EOF
primates.b
n
8000
y
EOF
----------------------------------------------------
It is often simplest to determine the parameters by running puzzle (not ppuzzle, but puzzle, which is the non-parallel version) on the biobos command-line, selecting the parameters and noting the order in which they are needed, and then entering the same parameters into the command file.