X!Tandem was developed by researchers as part of the Global Proteome Machine Organization.
Parallel Tandem has also been installed on the Biowulf cluster.
There are 2 situations in which it is advantageous to run X!Tandem jobs on Biowulf.
- if you have large numbers (100s, 1000s, or 10000s) of jobs, since the independent jobs can run simultaneously on different Biowulf nodes. Such jobs are most easily submitted via the swarm program.
- Using Parallel Tandem to divide a file of several thousand ms/ms spectra into several smaller files of an equal number of spectra and search them with X!Tandem in parallel.
Set up an X!Tandem input file for each run (you will probably want to write a script to set up these input files). Note that X!Tandem will by default write output files into its installation area /usr/local/xtandem/ where users do not have write permission, so it is important to use full pathnames in your input file.
-----sample input file--------------------------------- <?xml version="1.0"?> <bioml> <note type="input" label="list path, default parameters">/data/user/myproj/default_input.xml</note> <note type="input" label="list path, taxonomy information">/usr/local/xtandem/bin/taxonomy.xml</note> <note type="input" label="protein, taxon">other mammals</note> <note type="input" label="spectrum, path">/data/user/myproj/spectrum_1.pkl</note> <note type="input" label="output, path">/data/user/myproj/output_1.xml</note> </bioml> ----------------------end of sample file-------------------
Large numbers of single-threaded jobs like this are submitted using the swarm utility. Set up a swarm command file containing one line for each of your OMSSA runs. Here is a sample swarm command file:
------------------file sample.com-------------------- /usr/local/xtandem/bin/tandem.exe /data/user/myproj/input1.xml /usr/local/xtandem/bin/tandem.exe /data/user/myproj/input2.xml /usr/local/xtandem/bin/tandem.exe /data/user/myproj/input3.xml /usr/local/xtandem/bin/tandem.exe /data/user/myproj/input4.xml /usr/local/xtandem/bin/tandem.exe /data/user/myproj/input5.xml ----------------end of file -------------------------
swarm -f sample.com
Bundling jobs
If you have over 1000 X!Tandem searches to run, they should be bundled with the '-b' flag to swarm. The 'bundle number' is calculated by:bundle number = number of commands / (2* number of jobs)Thus, if you have 5000 searches and want them packaged into 100 batch jobs total, the bundle number would be 5000/200 = 25. The swarm of jobs would be submitted with:
swarm -b 25 -f sample.comAs you see, bundling the jobs hugely decreases the number of individual jobs and therefore decreases the overhead for such large numbers of small jobs. (More information about swarm options)
Parallel Xtandem is most commonly used to search for several potential modifications against an entire database, to build a sub-database of proteins for subsequent searches. Once this sub-database has been assembled, the resultant smaller database of candidate proteins can be searched again with refinement (searched for missed cleavages and partial cleavages with potential modifications), either sequentially or again in parallel. The results of running a search without refinement against the entire database, on one machine or in parallel with many machines, are in very close agreement. The results of running a subsequent search with refinement against the sub-database, on one machine compared to several machines in parallel, are also in close agreement but have produced a very small percent difference in the number of proteins identified (~2%). This is due to peptide expectation values that are borderline with respect to the confidence level set in the parallel input.xml parameters.
Parallel Xtandem was devloped in the lab of Andrew Link at the Vanderbilt University School of Medicine. Parallel Tandem webpage
The package includes several programs which are installed in /usr/local/xtandem/bin on Biowulf.
Documentation and details on running Parallel Xtandem
Note that the autotandem_mpi script will not run correctly in the Biowulf environment. Instead, submit your jobs with a command
#!/bin/csh # this file is myxtandem.bat #PBS -j oe setenv PATH /usr/local/xtandem/bin:/usr/local/mpich/bin:$PATH cd ${PBS_O_WORKDIR} date pwd mpirun -machinefile $PBS_NODEFILE -np $np \ /usr/local/xtandem/bin/tandem_mpi firstinput.xml secondinput.xml date
qsub -v np=20 -l nodes=10 myxtandem.bat
More information about X!Tandem can be found at the X!Tandem website.