The Open Mass Spectrometry Search Algorithm [OMSSA] is an efficient search engine for identifying MS/MS peptide spectra by searching libraries of known protein sequences. OMSSA scores significant hits with a probability score developed using classical hypothesis testing, the same statistical method used in BLAST.
OMSSA was developed by researchers at the NCBI, National Institutes of Health. [OMSSA website]
Small numbers of OMSSA jobs should be run on the NCBI OMSSA server. OMSSA on Biowulf is intended for running a large number of OMSSA searches, or running OMSSA against a personal database.
------------------file sample.com-------------------- /usr/local/omssa/omssacl -d /fdb/blastdb/nr -f file1.dta -ox file1.xml /usr/local/omssa/omssacl -d /fdb/blastdb/nr -f file2.dta -ox file2.xml /usr/local/omssa/omssacl -d /fdb/blastdb/nr -f file3.dta -ox file3.xml /usr/local/omssa/omssacl -d /fdb/blastdb/nr -f file4.dta -ox file4.xml ----------------end of file -------------------------
swarm -f sample.com -n 1Note about multithreading: As of v 2.1.0, OMSSA is multithreaded and will attempt to use all available processors on a node. Thus, it is critical to use the '-n 1' parameter on the swarm command above (sending only one OMSSA command to each node), otherwise the nodes will get overloaded and performance will suffer.
These OMSSA commands will produce XML output. You can write your own script to process the XML data. The OMSSA package includes a sample parser: the command to use it is
perl /usr/local/omssa/readOMSSA.pl file1.xmlThus, it is possible to set up an OMSSA search and parse the results in a single swarm command.
cd /data/user/mydir ; /usr/local/omssa/omssacl -d /fdb/blastdb/nr -f file1.dta -ox file1.xml \ ; perl /usr/local/omssa/readOMSSA.pl file1.xml >file1.out cd /data/user/mydir ; /usr/local/omssa/omssacl -d /fdb/blastdb/nr -f file2.dta -ox file2.xml \ ; perl /usr/local/omssa/readOMSSA.pl file2.xml >file2.out cd /data/user/mydir ; /usr/local/omssa/omssacl -d /fdb/blastdb/nr -f file3.dta -ox file3.xml \ ; perl /usr/local/omssa/readOMSSA.pl file3.xml >file3.out cd /data/user/mydir ; /usr/local/omssa/omssacl -d /fdb/blastdb/nr -f file4.dta -ox file4.xml \ ; perl /usr/local/omssa/readOMSSA.pl file4.xml >file4.out
OMSSA searches Blast-format sequence databases. A large collection of Blast
protein databases is available and updated on the Biowulf cluster, in
/fdb/blastdb/.
Names, location and
status of Blast databases. (OMSSA will search only protein databases)
If you have over 1000 OMSSA searches to run, they should be bundled with the '-b' flag to swarm, such that there are no more than a few hundred jobs. The 'bundle number' is calculated by:
bundle number = no. of commands / (2* no. of jobs)Thus, if you have 5000 OMSSA searches and want them packaged into 100 jobs total, the bundle number is 5000/200 = 25'. You would submit these jobs with the command:
swarm -b 50 -f sample.com
As always, jobs can be monitored using the Biowulf cluster monitors. Click on 'List status of running jobs only', and then your username or job number on the resultant page to view your own jobs only, as in the image on the right.
v2.4 (Oct 2008)
USAGE omssacl [-h] [-help] [-xmlhelp] [-pm param] [-d blastdb] [-umm] [-f infile] [-fx xmlinfile] [-fb dtainfile] [-fp pklinfile] [-fm pklinfile] [-foms omsinfile] [-fomx omxinfile] [-fbz2 bz2infile] [-fxml omxinfile] [-o textasnoutfile] [-ob binaryasnoutfile] [-ox xmloutfile] [-obz2 bz2outfile] [-op pepxmloutfile] [-oc csvfile] [-w] [-to pretol] [-te protol] [-tom promass] [-tem premass] [-tez prozdep] [-ta autotol] [-tex exact] [-i ions] [-cl cutlo] [-ch cuthi] [-ci cutinc] [-cp precursorcull] [-v cleave] [-x taxid] [-w1 window1] [-w2 window2] [-h1 hit1] [-h2 hit2] [-hl hitlist] [-ht tophitnum] [-hm minhit] [-hs minspectra] [-he evalcut] [-mf fixedmod] [-mv variablemod] [-mnm] [-mm maxmod] [-e enzyme] [-zh maxcharge] [-zl mincharge] [-zoh maxprodcharge] [-zt chargethresh] [-z1 plusone] [-zc calcplusone] [-zcc calccharge] [-pc pseudocount] [-sb1 searchb1] [-sct searchcterm] [-sp productnum] [-scorr corrscore] [-scorp corrprob] [-no minno] [-nox maxno] [-is subsetthresh] [-ir replacethresh] [-ii iterativethresh] [-p prolineruleions] [-il] [-el] [-ml] [-mx modinputfile] [-mux usermodinputfile] [-nt numthreads] [-ni] [-ns] [-os] [-logfile File_Name] [-conffile File_Name] [-version] [-version-full] [-dryrun] DESCRIPTION Search engine for identifying MS/MS peptide spectra OPTIONAL ARGUMENTS -h Print USAGE and DESCRIPTION; ignore other arguments -help Print USAGE, DESCRIPTION and ARGUMENTS description; ignore other arguments -xmlhelp Print USAGE, DESCRIPTION and ARGUMENTS description in XML format; ignore other arguments -pmsearch parameter input in xml format (overrides command line) Default = `' -d Blast sequence library to search. Do not include .p* filename suffixes. Default = `nr' -umm use memory mapped sequence libraries -f single dta file to search Default = `' -fx multiple xml-encapsulated dta files to search Default = `' -fb multiple dta files separated by blank lines to search Default = `' -fp pkl formatted file Default = `' -fm mgf formatted file Default = `' -foms omssa oms file Default = `' -fomx omssa omx file Default = `' -fbz2 omssa omx file compressed by bzip2 Default = `' -fxml omssa xml search request file Default = `' -o filename for text asn.1 formatted search results Default = `' -ob filename for binary asn.1 formatted search results Default = `' -ox filename for xml formatted search results Default = `' -obz2 filename for bzip2 compressed xml formatted search results Default = `' -op filename for pepXML formatted search results Default = `' -oc filename for csv formatted search summary Default = `' -w include spectra and search params in search results -to product ion m/z tolerance in Da Default = `0.8' -te precursor ion m/z tolerance in Da Default = `2.0' -tom product ion search type (0 = mono, 1 = avg, 2 = N15, 3 = exact) Default = `0' -tem precursor ion search type (0 = mono, 1 = avg, 2 = N15, 3 = exact) Default = `0' -tez charge dependency of precursor mass tolerance (0 = none, 1 = linear) Default = `0' -ta automatic mass tolerance adjustment fraction Default = `1.0' -tex threshold in Da above which the mass of neutron should be added in exact mass search Default = `1446.94' -i id numbers of ions to search (comma delimited, no spaces) Default = `1,4' -cl low intensity cutoff as a fraction of max peak Default = `0.0' -ch high intensity cutoff as a fraction of max peak Default = `0.2' -ci intensity cutoff increment as a fraction of max peak Default = `0.0005' -cp eliminate charge reduced precursors in spectra (0=no, 1=yes) Default = `0' -v number of missed cleavages allowed Default = `1' -x comma delimited list of taxids to search (0 = all) Default = `0' -w1 single charge window in Da Default = `20' -w2 double charge window in Da Default = `14' -h1 number of peaks allowed in single charge window Default = `2' -h2 number of peaks allowed in double charge window Default = `2' -hl maximum number of hits retained per precursor charge state per spectrum Default = `30' -ht number of m/z values corresponding to the most intense peaks that must include one match to the theoretical peptide Default = `6' -hm the minimum number of m/z matches a sequence library peptide must have for the hit to the peptide to be recorded Default = `2' -hs the minimum number of m/z values a spectrum must have to be searched Default = `4' -he the maximum evalue allowed in the hit list Default = `1' -mf comma delimited (no spaces) list of id numbers for fixed modifications Default = `' -mv comma delimited (no spaces) list of id numbers for variable modifications Default = `' -mnm n-term methionine should not be cleaved -mm the maximum number of mass ladders to generate per database peptide Default = `128' -e id number of enzyme to use Default = `0' -zh maximum precursor charge to search when not 1+ Default = `3' -zl minimum precursor charge to search when not 1+ Default = `1' -zoh maximum product charge to search Default = `2' -zt minimum precursor charge to start considering multiply charged products Default = `3' -z1 fraction of peaks below precursor used to determine if spectrum is charge 1 Default = `0.95' -zc should charge plus one be determined algorithmically? (1=yes) Default = `1' -zcc how should precursor charges be determined? (1=believe the input file, 2=use a range) Default = `2' -pc minimum number of precursors that match a spectrum Default = `1' -sb1 should first forward (b1) product ions be in search (1=no) Default = `1' -sct should c terminus ions be searched (1=no) Default = `0' -sp max number of ions in each series being searched (0=all) Default = `100' -scorr turn off correlation correction to score (1=off, 0=use correlation) Default = `0' -scorp probability of consecutive ion (used in correlation correction) Default = `0.5' -no minimum size of peptides for no-enzyme and semi-tryptic searches Default = `4' -nox maximum size of peptides for no-enzyme and semi-tryptic searches (0=none) Default = `40' -is evalue threshold to include a sequence in the iterative search, 0 = all Default = `0.0' -ir evalue threshold to replace a hit, 0 = only if better Default = `0.0' -ii evalue threshold to iteratively search a spectrum again, 0 = always Default = `0.01' -p id numbers of ion series to apply no product ions at proline rule at (comma delimited, no spaces) Default = `' -il print a list of ions and their corresponding id number -el print a list of enzymes and their corresponding id number -ml print a list of modifications and their corresponding id number -mx file containing modification data Default = `mods.xml' -mux file containing user modification data Default = `usermods.xml' -nt number of search threads to use, 0=autodetect Default = `0' -ni don't print informational messages -ns depreciated flag -os use omssa 1.0 scoring -logfile File to which the program log should be redirected -conffile Program's configuration (registry) data file -version Print version number; ignore other arguments -version-full Print extended version data; ignore other arguments -dryrun Dry run the application: do nothing, only test all preconditions