Biowulf at the NIH
X!Tandem on Biowulf
X! Tandem can match tandem mass spectra with peptide sequences, in a process that has come to be known as protein identification. This software takes an XML file of instructions on its command line, and outputs the results into an XML file, which has been specified in the input XML file.

X!Tandem was developed by researchers as part of the Global Proteome Machine Organization.

Parallel Tandem has also been installed on the Biowulf cluster.

There are 2 situations in which it is advantageous to run X!Tandem jobs on Biowulf.

  1. if you have large numbers (100s, 1000s, or 10000s) of jobs, since the independent jobs can run simultaneously on different Biowulf nodes. Such jobs are most easily submitted via the swarm program.

  2. Using Parallel Tandem to divide a file of several thousand ms/ms spectra into several smaller files of an equal number of spectra and search them with X!Tandem in parallel.

X!Tandem via swarm

Set up an X!Tandem input file for each run (you will probably want to write a script to set up these input files). Note that X!Tandem will by default write output files into its installation area /usr/local/xtandem/ where users do not have write permission, so it is important to use full pathnames in your input file.

-----sample input file---------------------------------
<?xml version="1.0"?>
<bioml>
   <note type="input" 
     label="list path, default parameters">/data/user/myproj/default_input.xml</note>
   <note type="input" 
     label="list path, taxonomy information">/usr/local/xtandem/bin/taxonomy.xml</note>

   <note type="input" label="protein, taxon">other mammals</note>

   <note type="input" label="spectrum, path">/data/user/myproj/spectrum_1.pkl</note>

   <note type="input" label="output, path">/data/user/myproj/output_1.xml</note>
</bioml>
----------------------end of sample file-------------------

Large numbers of single-threaded jobs like this are submitted using the swarm utility. Set up a swarm command file containing one line for each of your OMSSA runs. Here is a sample swarm command file:

------------------file sample.com--------------------
/usr/local/xtandem/bin/tandem.exe /data/user/myproj/input1.xml
/usr/local/xtandem/bin/tandem.exe /data/user/myproj/input2.xml
/usr/local/xtandem/bin/tandem.exe /data/user/myproj/input3.xml
/usr/local/xtandem/bin/tandem.exe /data/user/myproj/input4.xml
/usr/local/xtandem/bin/tandem.exe /data/user/myproj/input5.xml

----------------end of file -------------------------
Submit this file with
swarm -f sample.com

Bundling jobs

If you have over 1000 X!Tandem searches to run, they should be bundled with the '-b' flag to swarm. The 'bundle number' is calculated by:
bundle number = number of commands / (2* number of jobs)
Thus, if you have 5000 searches and want them packaged into 100 batch jobs total, the bundle number would be 5000/200 = 25. The swarm of jobs would be submitted with:
swarm -b 25 -f sample.com
As you see, bundling the jobs hugely decreases the number of individual jobs and therefore decreases the overhead for such large numbers of small jobs. (More information about swarm options)

Parallel Tandem

Parallel Xtandem is most commonly used to search for several potential modifications against an entire database, to build a sub-database of proteins for subsequent searches. Once this sub-database has been assembled, the resultant smaller database of candidate proteins can be searched again with refinement (searched for missed cleavages and partial cleavages with potential modifications), either sequentially or again in parallel. The results of running a search without refinement against the entire database, on one machine or in parallel with many machines, are in very close agreement. The results of running a subsequent search with refinement against the sub-database, on one machine compared to several machines in parallel, are also in close agreement but have produced a very small percent difference in the number of proteins identified (~2%). This is due to peptide expectation values that are borderline with respect to the confidence level set in the parallel input.xml parameters.

Parallel Xtandem was devloped in the lab of Andrew Link at the Vanderbilt University School of Medicine. Parallel Tandem webpage

The package includes several programs which are installed in /usr/local/xtandem/bin on Biowulf.

Documentation and details on running Parallel Xtandem

Note that the autotandem_mpi script will not run correctly in the Biowulf environment. Instead, submit your jobs with a command

#!/bin/csh
#  this file is myxtandem.bat
#PBS -j oe

setenv PATH /usr/local/xtandem/bin:/usr/local/mpich/bin:$PATH
cd ${PBS_O_WORKDIR}

date
pwd
mpirun -machinefile $PBS_NODEFILE -np $np \ 
    /usr/local/xtandem/bin/tandem_mpi firstinput.xml secondinput.xml
date
Submit this job with
qsub -v np=20 -l nodes=10 myxtandem.bat

More information about X!Tandem can be found at the X!Tandem website.