Biowulf at the NIH
X!Tandem on Biowulf
X! Tandem can match tandem mass spectra with peptide sequences, in a process that has come to be known as protein identification. This software takes an XML file of instructions on its command line, and outputs the results into an XML file, which has been specified in the input XML file.

X!Tandem was developed by researchers as part of the Global Proteome Machine Organization.

Small numbers of X!Tandem jobs should be performed on a local desktop machine. Running X!Tandem on the Biowulf cluster is useful only if you have large numbers (100s, 1000s, or 10000s) of jobs, since the independent jobs can run simultaneously on different Biowulf nodes.

How to run X!Tandem on Biowulf

Set up an X!Tandem input file for each run (you will probably want to write a script to set up these input files). Note that X!Tandem will by default write output files into its installation area /usr/local/xtandem/ where users do not have write permission, so it is important to use full pathnames in your input file.
-----sample input file---------------------------------
<?xml version="1.0"?>
<bioml>
   <note type="input" 
     label="list path, default parameters">/data/user/myproj/default_input.xml</note>
   <note type="input" 
     label="list path, taxonomy information">/usr/local/xtandem/bin/taxonomy.xml</note>

   <note type="input" label="protein, taxon">other mammals</note>

   <note type="input" label="spectrum, path">/data/user/myproj/spectrum_1.pkl</note>

   <note type="input" label="output, path">/data/user/myproj/output_1.xml</note>
</bioml>
----------------------end of sample file-------------------

Large numbers of single-threaded jobs like this are submitted using the swarm utility. Set up a swarm command file containing one line for each of your OMSSA runs. Here is a sample swarm command file:

------------------file sample.com--------------------
/usr/local/xtandem/bin/tandem.exe /data/user/myproj/input1.xml
/usr/local/xtandem/bin/tandem.exe /data/user/myproj/input2.xml
/usr/local/xtandem/bin/tandem.exe /data/user/myproj/input3.xml
/usr/local/xtandem/bin/tandem.exe /data/user/myproj/input4.xml
/usr/local/xtandem/bin/tandem.exe /data/user/myproj/input5.xml

----------------end of file -------------------------
Submit this file with
swarm -f sample.com

Bundling jobs

If you have over 1000 X!Tandem searches to run, they should be bundled with the '-b' flag to swarm. The 'bundle number' is calculated by:
bundle number = number of commands / (2* number of jobs)
Thus, if you have 5000 searches and want them packaged into 100 batch jobs total, the bundle number would be 5000/200 = 25. The swarm of jobs would be submitted with:
swarm -b 25 -f sample.com
As you see, bundling the jobs hugely decreases the number of individual jobs and therefore decreases the overhead for such large numbers of small jobs. (More information about swarm options)

Monitoring your jobs

As always, jobs can be monitored using the Biowulf cluster monitors. Click on 'List status of running jobs only', and then your username or job number on the resultant page to view your own jobs only, as in the image on the right.

More information about X!Tandem can be found at the X!Tandem website.