MrBayes is a program for the Bayesian estimation of phylogeny. Bayesian inference of phylogeny is based upon a quantity called the posterior probability distribution of trees, which is the probability of a tree conditioned on the observations. The conditioning is accomplished using Bayes's theorem. The posterior probability distribution of trees is impossible to calculate analytically; instead, MrBayes uses a simulation technique called Markov chain Monte Carlo (or MCMC) to approximate the posterior probabilities of trees.
The program takes as input a character matrix in a NEXUS file format. The output is several files with the parameters that were sampled by the MCMC algorithm. MrBayes can summarize the information in these files for the user. The program features include:
- Extensive help available via the command line;
- Ability to analyze nucleotide, amino acid, restriction site, and morphological data;
- Mixing of data types, such as molecular and morphological characters, in a single analysis;
- A general method for assigning parameters across data partitions;
- An abundance of evolutionary models, including 4 X 4, doublet, and codon models for nucleotide data and many of the standard rate matrices for amino acid data;
- Estimation of positively selected sites in a fully hierarchical Bayes framework;
- Distributed computing using MPI
The most recent version of the program, 3.1, is the first non-beta release of MrBayes 3. Compared to previous beta versions, it includes numerous bug fixes as well as several new features such as topology constraints, inference of ancestral states and site rates, and simultaneous independent runs with convergence diagnostics calculated on the fly.
MrBayes is developed by a group of researchers at several institutions.
1. Create a script file which contains the MrBayes commands as below:
---------- /data/user/mrbayes/test -------------- #!/bin/bash # #PBS -N MrBayes #PBS -m be #PBS -k oe PATH=/usr/local/mpich/bin:$PATH; export PATH cd /usr/local/bench/mrbayes mpirun -machinefile $PBS_NODEFILE -np $np /usr/local/bin/mrbayes arch107_L1000.nex
2. Now submit the script using the 'qsub' command, e.g.
qsub -v np=8 -l nodes=4:o2800 /data/user/mrbayes/testWhere
np is the desired number of processors (2x the number of nodes, 4x for dual-core nodes)
nodes is the desired number of nodes (in this case, 4)
o2800 is the desired type of processor
"test" is the name of the script file.
MrBayes is parallelized, and uses MPI to distribute heated and cold chains among available processors. When run in parallel, each chain is done by a single processor. Thus, MrBayes cannot use more processors than there are chains. If you submit your MrBayes job to more processors than you have chains, you will see the error message
> " The number of chains must be at least as great > as the number of processors (#)It is possible to increase the number of chains (nchains) or the number of independent runs (nruns), and then submit to more processors. Increasing the 'nruns' parameter and running on more processors will not speed up the calculation, since each independent run will still take the same amount of time to compute. However, it will allow you to have more independent runs evaluated at the same time, and therefore get a better result.
To run a large number of MrBayes jobs, and have each job use multiple processors, follow this procedure. Set up a swarm command file along the following lines:
# --- this file is swarm.cmd ------ setenv PATH /usr/local/mpich/bin:$PATH ; cd /data/user/myjob/a1 ; mpirun -machinefile $PBS_NODEFILE -np 4 /usr/local/bin/mrbayes test1.nex setenv PATH /usr/local/mpich/bin:$PATH ; cd /data/user/myjob/a2 ; mpirun -machinefile $PBS_NODEFILE -np 4 /usr/local/bin/mrbayes test2.nex setenv PATH /usr/local/mpich/bin:$PATH ; cd /data/user/myjob/a3 ; mpirun -machinefile $PBS_NODEFILE -np 4 /usr/local/bin/mrbayes test3.nex
Submit this swarm with the command:
swarm -f swarm.cmd -n 1 -l nodes=1:dc
If you have a small MrBayes job, it is probably easiest to run on Helix. Occasionally, for debugging purposes, an interactive job may be run on Biowulf by allocating an interactive node. Please remember to exit from the node when done.
<biowulf>% qsub -I -l nodes=1
qsub: waiting for job 593807.biobos to start
qsub: job 593807.biobos ready <p2>% cd /usr/local/bin <p2>%mrbayes MrBayes v3.1.2 (Bayesian Analysis of Phylogeny) (Parallel version) (1 processors available) by John P. Huelsenbeck and Fredrik Ronquist Section of Ecology, Behavior and Evolution Division of Biological Sciences University of California, San Diego johnh@biomail.ucsd.edu School of Computational Science Florida State University ronquist@csit.fsu.edu Distributed under the GNU General Public License Type "help" or "help <command>" for information on the commands that are available. MrBayes > execute /usr/local/bench/mrbayes/arch107_L1000.nex Executing file "/usr/local/bench/mrbayes/arch107_L1000.nex" UNIX line termination Longest line length = 1011 Parsing file Expecting NEXUS formatted file Reading data block Allocated matrix Matrix has 107 taxa and 1000 characters Missing data coded as ? Gaps coded as - Data is Dna Setting default partition (does not divide up characters). Taxon 1 -> Har.maris2 Taxon 2 -> Har.maris1 Taxon 3 -> Har.mukoht Taxon 4 -> Ntm.pharao Taxon 5 -> AB012057 Taxon 6 -> AB012052 Taxon 7 -> AB012054 Taxon 8 -> Hc.salifo2 Taxon 9 -> Hb.cutirub Taxon 10 -> AF071880 [....] Taxon 101 -> U81774 Taxon 102 -> AB019720 Taxon 103 -> AF068822 Taxon 104 -> AB019721 Taxon 105 -> AB019719 Taxon 106 -> AB019715 Taxon 107 -> AB019717 Setting output file names to "/usr/local/bench/mrbayes/arch107_L1000.nex.run<i>.<p/t>" Successfully read matrix Exiting data block Reached end of file MrBayes >quit Deleting matrix Quitting program <p2>%exit <biowulf>%
- The MrBayes website
- The MrBayes manual (PDF)
- The MrBayes wiki which contains the manual and a FAQ.