Biowulf at the NIH
MrBayes on Biowulf

MrBayes is a program for the Bayesian estimation of phylogeny. Bayesian inference of phylogeny is based upon a quantity called the posterior probability distribution of trees, which is the probability of a tree conditioned on the observations. The conditioning is accomplished using Bayes's theorem. The posterior probability distribution of trees is impossible to calculate analytically; instead, MrBayes uses a simulation technique called Markov chain Monte Carlo (or MCMC) to approximate the posterior probabilities of trees.

The program takes as input a character matrix in a NEXUS file format. The output is several files with the parameters that were sampled by the MCMC algorithm. MrBayes can summarize the information in these files for the user. The program features include:

The most recent version of the program, 3.1, is the first non-beta release of MrBayes 3. Compared to previous beta versions, it includes numerous bug fixes as well as several new features such as topology constraints, inference of ancestral states and site rates, and simultaneous independent runs with convergence diagnostics calculated on the fly.

MrBayes is developed by a group of researchers at several institutions.

Submit A Single MrBayes Batch Job(s):

1. Create a script file which contains the MrBayes commands as below:

---------- /data/user/mrbayes/test --------------

#!/bin/bash
#
#PBS -N MrBayes
#PBS -m be
#PBS -k oe
PATH=/usr/local/mpich/bin:$PATH; export PATH

cd /usr/local/bench/mrbayes

mpirun -machinefile $PBS_NODEFILE -np $np /usr/local/bin/mrbayes arch107_L1000.nex

2. Now submit the script using the 'qsub' command, e.g.

qsub -v np=8 -l nodes=4:o2800 /data/user/mrbayes/test
Where
np is the desired number of processors (2x the number of nodes, 4x for dual-core nodes)
nodes is the desired number of nodes (in this case, 4)
o2800 is the desired type of processor
"test" is the name of the script file.

Parallelization

MrBayes is parallelized, and uses MPI to distribute heated and cold chains among available processors. When run in parallel, each chain is done by a single processor. Thus, MrBayes cannot use more processors than there are chains. If you submit your MrBayes job to more processors than you have chains, you will see the error message

> " The number of chains must be at least as great
>      as the number of processors (#)
It is possible to increase the number of chains (nchains) or the number of independent runs (nruns), and then submit to more processors. Increasing the 'nruns' parameter and running on more processors will not speed up the calculation, since each independent run will still take the same amount of time to compute. However, it will allow you to have more independent runs evaluated at the same time, and therefore get a better result.

Submitting a swarm of MrBayes jobs

To run a large number of MrBayes jobs, and have each job use multiple processors, follow this procedure. Set up a swarm command file along the following lines:

# --- this file is swarm.cmd ------
setenv PATH /usr/local/mpich/bin:$PATH ; cd /data/user/myjob/a1 ; mpirun -machinefile $PBS_NODEFILE -np 4 /usr/local/bin/mrbayes test1.nex
setenv PATH /usr/local/mpich/bin:$PATH ; cd /data/user/myjob/a2 ; mpirun -machinefile $PBS_NODEFILE -np 4 /usr/local/bin/mrbayes test2.nex
setenv PATH /usr/local/mpich/bin:$PATH ; cd /data/user/myjob/a3 ; mpirun -machinefile $PBS_NODEFILE -np 4 /usr/local/bin/mrbayes test3.nex
In the example above, a different directory is being used for each run for convenience. Each MrBayes run is set up to use 4 processors ('-np 4'). Thus, the swarm command must also be set up so that each MrBayes run is allocated a single node with 4 processors.

Submit this swarm with the command:

swarm -f swarm.cmd -n 1 -l nodes=1:dc 
The '-n 1' flag ensures that only 1 MrBayes job will run on each node. Since the MrBayes commands within the swarm file use '-np 4' (i.e. run on 4 processors), the swarm jobs must run on nodes with 4 processors, i.e. the dual-core (dc) nodes.

Run MrBayes interactively

If you have a small MrBayes job, it is probably easiest to run on Helix. Occasionally, for debugging purposes, an interactive job may be run on Biowulf by allocating an interactive node. Please remember to exit from the node when done.

<biowulf>% qsub -I -l nodes=1
qsub: waiting for job 593807.biobos to start
qsub: job 593807.biobos ready <p2>% cd /usr/local/bin <p2>%mrbayes MrBayes v3.1.2 (Bayesian Analysis of Phylogeny) (Parallel version) (1 processors available) by John P. Huelsenbeck and Fredrik Ronquist Section of Ecology, Behavior and Evolution Division of Biological Sciences University of California, San Diego johnh@biomail.ucsd.edu School of Computational Science Florida State University ronquist@csit.fsu.edu Distributed under the GNU General Public License Type "help" or "help <command>" for information on the commands that are available. MrBayes > execute /usr/local/bench/mrbayes/arch107_L1000.nex Executing file "/usr/local/bench/mrbayes/arch107_L1000.nex" UNIX line termination Longest line length = 1011 Parsing file Expecting NEXUS formatted file Reading data block Allocated matrix Matrix has 107 taxa and 1000 characters Missing data coded as ? Gaps coded as - Data is Dna Setting default partition (does not divide up characters). Taxon 1 -> Har.maris2 Taxon 2 -> Har.maris1 Taxon 3 -> Har.mukoht Taxon 4 -> Ntm.pharao Taxon 5 -> AB012057 Taxon 6 -> AB012052 Taxon 7 -> AB012054 Taxon 8 -> Hc.salifo2 Taxon 9 -> Hb.cutirub Taxon 10 -> AF071880 [....] Taxon 101 -> U81774 Taxon 102 -> AB019720 Taxon 103 -> AF068822 Taxon 104 -> AB019721 Taxon 105 -> AB019719 Taxon 106 -> AB019715 Taxon 107 -> AB019717 Setting output file names to "/usr/local/bench/mrbayes/arch107_L1000.nex.run<i>.<p/t>" Successfully read matrix Exiting data block Reached end of file MrBayes >quit Deleting matrix Quitting program <p2>%exit <biowulf>%
Documentation
The MrBayes website
The MrBayes manual (PDF)
The MrBayes wiki which contains the manual and a FAQ.