Frequently Asked Questions
[1] Errors
Q1.1: Why do I get the error "Server Error.
Your request could not be processed due to a problem on our Web
server..." when I submit the search on the web?
Q1.2: Why do I get the error "Lo cut exceeds
Hi cut value"?
Q1.3: In the command line version, I get a "bad_alloc"
exception. Why?
Q1.4: In the command line version, I get warning "Taxonomically restricted search specified and no matching organisms found in sequence library". Why?
[2] Settings and limitations
Q2.1: What are the restrictions on the web-based
search service compared to the executable download?
Q2.2: In the tryptic digest option, do you follow
the rule of not cleaving R and K if they come before P?
Q2.3. Do you support non-specific cleavage or n-terminal
cleavage (like Asp-N)?
Q2.4: What is the maximum number of missed cleavages?
Q2.5: Do you search on C-terminal peptides, i.e.
those at the end of the sequence?
Q2.6: What is the maximum number of spectra I can
search when using the command line version?
Q2.7: What are the maximum number of variable modifications
that can be selected?
Q2.8: A modification that I would like to search
for is missing from OMSSA. How can it be added?
Q2.9: How do I search a taxonomic group, like mammalia?
Q2.10: Can I specify the mass tolerances in ppm?
Q2.11: Should I search using average precursor mass
or monoisotopic precursor mass?
[3] Output and scoring
Q3.1:Why do some of the spectra show no search results?
Q3.2: What is an E-value?
Q3.3: How can I improve the E-values of the hits?
Q3.4: What is a reasonable E-value cutoff?
Q3.5: How can I improve the sensitivity of the algorithm?
Q3.6: Why does the initial result page list only
one hit?
Q3.7: The csv output (Excel) is missing xxxxx (some
part of the search result)? Can this be added?
Q3.8: Do you have a search result format that summarizes
hits by protein first instead of by peptide first?
Q3.9: Other than installing Perl, what do I need
to use the sample XML parser for OMSSA XML?
Q3.10: The Perl parser for OMSSA XML complains
that ParserDetails.ini is missing.
Q3.11: Why don't I see a really nice hit that
I saw before/from another search algorithm?
Q3.12: Does OMSSA report the positions of fixed modifications?
[4] Using the downloadable version
Q4.1: Can OMSSA run on a compute cluster?
Q4.2: Do you provide the queueing software that
you use to run the OMSSA search service at NCBI? We'd like to use
it on our cluster.
Q4.3: Can I search my own sequence library?
Q4.4: Does OMSSA read FASTA directly?
Q4.5: How do I specify taxonomic (species) information in FASTA?
Q4.6: What kind of computer will run OMSSA most
efficiently?
[5] Building OMSSA yourself
Q5.1: Where is the OMSSA source code?
Q5.2: What is required to build OMSSA?
Q1.1: Why do I get the error "Server Error.
Your request could not be processed due to a problem on our Web
server..." when I submit the search on the web?
A. It's possible that you sent a very large file (>50 Mb) to
our search service -- this is a limitation of the web server software
that we use at NCBI. Please try breaking your spectra file into
several smaller files.
Q1.2: Why do I get the error "Lo cut exceeds
Hi cut value"?
A. It's possible that you set the "Peak intensity cutoff"
to too high a value -- the maximum is 0.2. You probably can leave
this value at 0, as this value is just a starting value for OMSSA,
which automatically adjusts it for best signal to noise.
Q1.3: In the command line version, I get a "bad_alloc"
exception. Why?
A: You may have run out of memory. Try reducing the number of spectra
being searched.
Q1.4: In the command line version, I get warning "Taxonomically restricted search specified and no matching organisms found in sequence library". Why?
A: You specified the taxonomy id of a species on the command line but used a sequence library without taxonomic ids in it. Most likely you are using a blast library made using FASTA formatted sequences, which do not encode computer readable taxonomic information. Native blast libraries do have taxonomic information.
[2] Settings and limitations
Q2.1: What are the restrictions on the web-based
search service compared to the executable download?
A. 1. You must select the organisms that you are searching for
-- to search all species, please uses the downloadable executable.
2. The number of spectra is limited to 2000 per search or 200 per non-specific enzyme search, although
there is no limitation on the number of searches. 3. The results
are only kept for a few days (please see our privacy
policy), although you may download the results from a form on
the search results page 4. The maximum number of modification sites
exhaustively searched is effectively limited to 6 per peptide (see
the question below on the max number of variable modifications).
5. You can only search the sequence libraries provided, nr and refseq.
Q2.2: In the tryptic digest option, do you follow
the rule of not cleaving R and K if they come before P?
A: Yes. Additionally, they are not counted in the missed cleavage
calculation.
Q2.3: Do you support non-specific cleavage or
n-terminal cleavage (like Asp-N)?
A: Yes.
Q2.4: What is the maximum number of missed cleavages?
A: 3 in the command line program and 2 in the web-based search
service.
Q2.5: Do you search on C-terminal peptides, i.e.
those at the end of the sequence?
A: Yes.
Q2.6: What is the maximum number of spectra I
can search when using the command line version?
A: There is no maximum, but for optimal efficiency, you should
batch your spectra to fit into available memory (e.g. sets of 10000
or less).
Q2.7: What are the maximum number of variable
modifications that can be selected?
A: As many as you want. However, the number of variable modification
sites per peptide is limited to 32. Additionally, the number of
modification combinations searched per peptide is limited to 8192
on command line and 64 in the web search. This effectively limits
a exhaustive search of all modification combinations per peptide
to 13 sites in the command line version and 6 in the web search.
Q2.8: A modification that I would like to search
for is missing from OMSSA. How can it be added?
A: Please contact Lewis
Geer.
Q2.9: How do I search a taxonomic group, like
mammalia?
A: This is not presently supported in OMSSA. The workaround is
to specify all of the taxonomic id's of species in the group you
are interested in. You can find these species by using the NCBI
taxonomy database.
Q2.10: Can I specify the mass tolerances in
ppm?
A: No, not presently. This will be added in a future version of
OMSSA. The workaround is to specify masses in Daltons.
Q2.11: Should I search using average precursor
mass or monoisotopic precursor mass?
A: The answer is dependent on your experimental setup. If
you expect many peptides whose masses are significantly greater
than, say 1500 Daltons, and your precursor mass tolerances are >
1 Da, then you may want to search with average mass to avoid missing
hits. When in doubt, you could check which setting to use with a
test set of data.
[3] Output and scoring
Q3.1:Why do some of the spectra show no search
results?
A: Probably because the best hits have an E-value greater than
the E-value cutoff.
Q3.2: What is an E-value?
A: The E-value is a measure of statistical significance, where
a small E-value indicates a significant hit. The E-value is the
expected number of random hits from a search library to a given
spectrum such that the random hits have an equal or better score
than the hit. For example, a hit with an E-value of 1.0 implies
that one hit with a score equal to or better than the hit being
scored would be expected at random from a sequence library search.
Q3.3: How can I improve the E-values of the hits?
A. 1. restrict the search to only the organisms sampled in the
experiment. 2. change the mass tolerances so that they are optimized
for the instrument you are getting the spectra from -- small changes
in mass sensitivity can result in significant changes in e-values.
3. limit the number of variable modifications searched.
Q3.4: What is a reasonable E-value cutoff?
A. This is largely a question for the experimentalist, and will
differ, in part, on the objectives of the experiment. For example,
if you are running an experiment that requires no human intervention,
even at the analysis stage, you may want to adopt a conservative
E-value threshold. If human intervention happens at every stage
of the analysis, you may opt for a less conservative threshold.
For more information, please consult the OMSSA
paper.
Q3.5: How can I improve the sensitivity of the
algorithm?
A. In addition to the steps listed above under improving the E-values,
you can also adjust the "-ht" or "Number of top intensity
peaks in first pass" parameter upwards. This will slow the
algorithm but improve the sensitivity in cases where there are a
significant number of intense precursor ions or non-standard product
ions in the ms/ms spectra.
Q3.6: Why does the initial result page list only
one hit?
A: The initial result page is meant to list all of spectra searched
and give some indication whether the search for a particular spectrum
succeeded, but not to give a full set of search results. If you
click on the row corresponding to the single spectrum result, you
will get a full list hits for the spectra.
Q3.7: The csv output (Excel) is missing xxxxx
(some part of the search result)? Can this be added?
A: The XML output contains detailed search information -- the csv
format is a summary format. There are multiple tools used to parse
XML. As an example, the OMSSA download contains a simple Perl parser
for OMSSA XML output as well as an simple xslt transform.
Q3.8: Do you have a search result format that
summarizes hits by protein first instead of by peptide first?
A: Not at this time.
Q3.9: Other than installing Perl, what do I need
to use the sample XML parser for OMSSA XML?
A: If not installed by default, you will need the XML::SAX module
from CPAN or ActiveState.
Q3.10: The Perl parser for OMSSA XML complains
that ParserDetails.ini is missing.
A: This file is not necessary for the parser to work.
Q3.11: Why don't I see a really nice hit that
I saw before/from another search algorithm?
A: You may wish to make sure that you are using *exactly* the same
search parameters, e.g. product/precursor mass tolerances, average/monoisotopic
searching, sequence library, taxonomy, etc. Make sure that you are
searching the using a large value for the maximum modification combinations
per peptide parameter if you are looking for PTMs. Many of these
parameters have a significant effect on e-values.
Q3.12: Does OMSSA report the positions of fixed modifications?
A: Although it takes them into account when calculating masses, it does not report the positions of fixed modifications
[4] Using the downloadable version
Q4.1: Can OMSSA run on a compute cluster?
A: Yes, OMSSA is designed to run on a cluster. Each instance of
OMSSA runs as a separate process, but each
process memory maps the sequence library files to maximize memory
sharing between processes. Ideally, one OMSSA process should run
per processing core, with all the cores on a particular machine
using the same sequence library files, which allow the cores to
share the memory map. OMSSA is not explicitly parallelized, threaded,
or forked, so you will have to explicitly start each process. Optimally,
each OMSSA process will be fed multiple spectra at once, as OMSSA
is optimized to search multiple spectra at a time. Splitting sets
of spectra across nodes is generally only useful if the number of
spectra per node starts exceeding available memory. The number of
spectra where this starts to happen depends on your configuration,
but typically it is in the thousands.
Q4.2: Do you provide the queueing software that
you use to run the OMSSA search service at NCBI? We'd like to use
it on our cluster.
A: No, as it is specifically architected to the software and hardware
we use at NCBI. Most compute clusters provide some degree of this functionality
tailored specifically to the architecture of the cluster, so it would be very difficult to for us to develop a general purpose solution. It might
be useful to note that if you run BLAST on your cluster, it's likely
that you can run OMSSA the same way. One useful tool for distributing searches across nodes is omssamerge, a command line program to merge together search result files. omssamerge is distributed along with the command line version of omssa
Q4.3: Can I search my own sequence library?
A: Yes, if you download and use OMSSA on your own machine. If you
have FASTA formatted sequences, you can use formatdb
which is found in the BLAST distribution to make OMSSA searchable
sequence libraries. OMSSA uses the same sequence library format
as BLAST.
Q4.4: Does OMSSA read FASTA directly?
A: No. You need to convert FASTA into a BLAST sequence library
using formatdb (see previous question).
Q4.5: How do I specify taxonomic (species) information in FASTA?
A: FASTA does not support taxonomic information in its format.
Q4.6: What kind of computer will run OMSSA most
efficiently?
A: It depends, but it's helpful to have a computer with lots of
RAM, a large cache, and fast CPUs. Dual processors or dual cores
are also helpful.
[5] Building OMSSA yourself
Q5.1: Where is the OMSSA source code?
A: In the c++ toolkit, located at ftp://ftp.ncbi.nih.gov/toolbox/ncbi_tools++/CURRENT/.
If you are using Visual Studio, the solution file to use is "ncbi_cpp.sln"
and the project is "omssacl.exe".
Q5.2: What is required to build OMSSA?
A: You need to build the core code for both the NCBI c++
toolkit, the requirements of which are detailed in this documentation.
Note that building OMSSA can require a significant
amount of work due to the framework necessary to build it on multiple
platforms.
|