Search Search Status Download FAQ Help

Frequently Asked Questions

 

[1] Errors

Q1.1: Why do I get the error "Server Error. Your request could not be processed due to a problem on our Web server..." when I submit the search on the web?

Q1.2: Why do I get the error "Lo cut exceeds Hi cut value"?

Q1.3: In the command line version, I get a "bad_alloc" exception. Why?

Q1.4: In the command line version, I get warning "Taxonomically restricted search specified and no matching organisms found in sequence library". Why?

 

[2] Settings and limitations

Q2.1: What are the restrictions on the web-based search service compared to the executable download?

Q2.2: In the tryptic digest option, do you follow the rule of not cleaving R and K if they come before P?

Q2.3. Do you support non-specific cleavage or n-terminal cleavage (like Asp-N)?

Q2.4: What is the maximum number of missed cleavages?

Q2.5: Do you search on C-terminal peptides, i.e. those at the end of the sequence?

Q2.6: What is the maximum number of spectra I can search when using the command line version?

Q2.7: What are the maximum number of variable modifications that can be selected?

Q2.8: A modification that I would like to search for is missing from OMSSA. How can it be added?

Q2.9: How do I search a taxonomic group, like mammalia?

Q2.10: Can I specify the mass tolerances in ppm?

Q2.11: Should I search using average precursor mass or monoisotopic precursor mass?

 

[3] Output and scoring

Q3.1:Why do some of the spectra show no search results?

Q3.2: What is an E-value?

Q3.3: How can I improve the E-values of the hits?

Q3.4: What is a reasonable E-value cutoff?

Q3.5: How can I improve the sensitivity of the algorithm?

Q3.6: Why does the initial result page list only one hit?

Q3.7: The csv output (Excel) is missing xxxxx (some part of the search result)? Can this be added?

Q3.8: Do you have a search result format that summarizes hits by protein first instead of by peptide first?

Q3.9: Other than installing Perl, what do I need to use the sample XML parser for OMSSA XML?

Q3.10: The Perl parser for OMSSA XML complains that ParserDetails.ini is missing.

Q3.11: Why don't I see a really nice hit that I saw before/from another search algorithm?

Q3.12: Does OMSSA report the positions of fixed modifications?

 

 

[4] Using the downloadable version

Q4.1: Can OMSSA run on a compute cluster?

Q4.2: Do you provide the queueing software that you use to run the OMSSA search service at NCBI? We'd like to use it on our cluster.

Q4.3: Can I search my own sequence library?

Q4.4: Does OMSSA read FASTA directly?

Q4.5: How do I specify taxonomic (species) information in FASTA?

Q4.6: What kind of computer will run OMSSA most efficiently?

 

[5] Building OMSSA yourself

Q5.1: Where is the OMSSA source code?

Q5.2: What is required to build OMSSA?


 

[1] Errors

 

Q1.1: Why do I get the error "Server Error. Your request could not be processed due to a problem on our Web server..." when I submit the search on the web?

A. It's possible that you sent a very large file (>50 Mb) to our search service -- this is a limitation of the web server software that we use at NCBI. Please try breaking your spectra file into several smaller files.

 

Q1.2: Why do I get the error "Lo cut exceeds Hi cut value"?

A. It's possible that you set the "Peak intensity cutoff" to too high a value -- the maximum is 0.2. You probably can leave this value at 0, as this value is just a starting value for OMSSA, which automatically adjusts it for best signal to noise.

 

Q1.3: In the command line version, I get a "bad_alloc" exception. Why?

A: You may have run out of memory. Try reducing the number of spectra being searched.

Q1.4: In the command line version, I get warning "Taxonomically restricted search specified and no matching organisms found in sequence library". Why?

A: You specified the taxonomy id of a species on the command line but used a sequence library without taxonomic ids in it. Most likely you are using a blast library made using FASTA formatted sequences, which do not encode computer readable taxonomic information. Native blast libraries do have taxonomic information.

 

 

[2] Settings and limitations

 

Q2.1: What are the restrictions on the web-based search service compared to the executable download?

A. 1. You must select the organisms that you are searching for -- to search all species, please uses the downloadable executable. 2. The number of spectra is limited to 2000 per search or 200 per non-specific enzyme search, although there is no limitation on the number of searches. 3. The results are only kept for a few days (please see our privacy policy), although you may download the results from a form on the search results page 4. The maximum number of modification sites exhaustively searched is effectively limited to 6 per peptide (see the question below on the max number of variable modifications). 5. You can only search the sequence libraries provided, nr and refseq.

 

Q2.2: In the tryptic digest option, do you follow the rule of not cleaving R and K if they come before P?

A: Yes. Additionally, they are not counted in the missed cleavage calculation.

 

Q2.3: Do you support non-specific cleavage or n-terminal cleavage (like Asp-N)?

A: Yes.

 

Q2.4: What is the maximum number of missed cleavages?

A: 3 in the command line program and 2 in the web-based search service.

 

Q2.5: Do you search on C-terminal peptides, i.e. those at the end of the sequence?

A: Yes.

 

Q2.6: What is the maximum number of spectra I can search when using the command line version?

A: There is no maximum, but for optimal efficiency, you should batch your spectra to fit into available memory (e.g. sets of 10000 or less).

 

Q2.7: What are the maximum number of variable modifications that can be selected?

A: As many as you want. However, the number of variable modification sites per peptide is limited to 32. Additionally, the number of modification combinations searched per peptide is limited to 8192 on command line and 64 in the web search. This effectively limits a exhaustive search of all modification combinations per peptide to 13 sites in the command line version and 6 in the web search.

 

Q2.8: A modification that I would like to search for is missing from OMSSA. How can it be added?

A: Please contact Lewis Geer.

 

Q2.9: How do I search a taxonomic group, like mammalia?

A: This is not presently supported in OMSSA. The workaround is to specify all of the taxonomic id's of species in the group you are interested in. You can find these species by using the NCBI taxonomy database.

 

Q2.10: Can I specify the mass tolerances in ppm?

A: No, not presently. This will be added in a future version of OMSSA. The workaround is to specify masses in Daltons.

 

Q2.11: Should I search using average precursor mass or monoisotopic precursor mass?

A: The answer is dependent on your experimental setup. If you expect many peptides whose masses are significantly greater than, say 1500 Daltons, and your precursor mass tolerances are > 1 Da, then you may want to search with average mass to avoid missing hits. When in doubt, you could check which setting to use with a test set of data.

 

[3] Output and scoring

 

Q3.1:Why do some of the spectra show no search results?

A: Probably because the best hits have an E-value greater than the E-value cutoff.

 

Q3.2: What is an E-value?

A: The E-value is a measure of statistical significance, where a small E-value indicates a significant hit. The E-value is the expected number of random hits from a search library to a given spectrum such that the random hits have an equal or better score than the hit. For example, a hit with an E-value of 1.0 implies that one hit with a score equal to or better than the hit being scored would be expected at random from a sequence library search.

 

Q3.3: How can I improve the E-values of the hits?

A. 1. restrict the search to only the organisms sampled in the experiment. 2. change the mass tolerances so that they are optimized for the instrument you are getting the spectra from -- small changes in mass sensitivity can result in significant changes in e-values. 3. limit the number of variable modifications searched.

 

Q3.4: What is a reasonable E-value cutoff?

A. This is largely a question for the experimentalist, and will differ, in part, on the objectives of the experiment. For example, if you are running an experiment that requires no human intervention, even at the analysis stage, you may want to adopt a conservative E-value threshold. If human intervention happens at every stage of the analysis, you may opt for a less conservative threshold. For more information, please consult the OMSSA paper.

 

Q3.5: How can I improve the sensitivity of the algorithm?

A. In addition to the steps listed above under improving the E-values, you can also adjust the "-ht" or "Number of top intensity peaks in first pass" parameter upwards. This will slow the algorithm but improve the sensitivity in cases where there are a significant number of intense precursor ions or non-standard product ions in the ms/ms spectra.

 

Q3.6: Why does the initial result page list only one hit?

A: The initial result page is meant to list all of spectra searched and give some indication whether the search for a particular spectrum succeeded, but not to give a full set of search results. If you click on the row corresponding to the single spectrum result, you will get a full list hits for the spectra.

 

Q3.7: The csv output (Excel) is missing xxxxx (some part of the search result)? Can this be added?

A: The XML output contains detailed search information -- the csv format is a summary format. There are multiple tools used to parse XML. As an example, the OMSSA download contains a simple Perl parser for OMSSA XML output as well as an simple xslt transform.

 

Q3.8: Do you have a search result format that summarizes hits by protein first instead of by peptide first?

A: Not at this time.

 

Q3.9: Other than installing Perl, what do I need to use the sample XML parser for OMSSA XML?

A: If not installed by default, you will need the XML::SAX module from CPAN or ActiveState.

 

Q3.10: The Perl parser for OMSSA XML complains that ParserDetails.ini is missing.

A: This file is not necessary for the parser to work.

 

Q3.11: Why don't I see a really nice hit that I saw before/from another search algorithm?

A: You may wish to make sure that you are using *exactly* the same search parameters, e.g. product/precursor mass tolerances, average/monoisotopic searching, sequence library, taxonomy, etc. Make sure that you are searching the using a large value for the maximum modification combinations per peptide parameter if you are looking for PTMs. Many of these parameters have a significant effect on e-values.

 

Q3.12: Does OMSSA report the positions of fixed modifications?

A: Although it takes them into account when calculating masses, it does not report the positions of fixed modifications

 

[4] Using the downloadable version

 

Q4.1: Can OMSSA run on a compute cluster?

A: Yes, OMSSA is designed to run on a cluster. Each instance of OMSSA runs as a separate process, but each
process memory maps the sequence library files to maximize memory sharing between processes. Ideally, one OMSSA process should run per processing core, with all the cores on a particular machine using the same sequence library files, which allow the cores to share the memory map. OMSSA is not explicitly parallelized, threaded, or forked, so you will have to explicitly start each process. Optimally, each OMSSA process will be fed multiple spectra at once, as OMSSA is optimized to search multiple spectra at a time. Splitting sets of spectra across nodes is generally only useful if the number of spectra per node starts exceeding available memory. The number of spectra where this starts to happen depends on your configuration, but typically it is in the thousands.

 

Q4.2: Do you provide the queueing software that you use to run the OMSSA search service at NCBI? We'd like to use it on our cluster.

A: No, as it is specifically architected to the software and hardware we use at NCBI. Most compute clusters provide some degree of this functionality tailored specifically to the architecture of the cluster, so it would be very difficult to for us to develop a general purpose solution. It might be useful to note that if you run BLAST on your cluster, it's likely that you can run OMSSA the same way. One useful tool for distributing searches across nodes is omssamerge, a command line program to merge together search result files. omssamerge is distributed along with the command line version of omssa

 

Q4.3: Can I search my own sequence library?

A: Yes, if you download and use OMSSA on your own machine. If you have FASTA formatted sequences, you can use formatdb which is found in the BLAST distribution to make OMSSA searchable sequence libraries. OMSSA uses the same sequence library format as BLAST.

 

Q4.4: Does OMSSA read FASTA directly?

A: No. You need to convert FASTA into a BLAST sequence library using formatdb (see previous question).

 

Q4.5: How do I specify taxonomic (species) information in FASTA?

A: FASTA does not support taxonomic information in its format.

 

Q4.6: What kind of computer will run OMSSA most efficiently?

A: It depends, but it's helpful to have a computer with lots of RAM, a large cache, and fast CPUs. Dual processors or dual cores are also helpful.

 

[5] Building OMSSA yourself

 

Q5.1: Where is the OMSSA source code?

A: In the c++ toolkit, located at ftp://ftp.ncbi.nih.gov/toolbox/ncbi_tools++/CURRENT/. If you are using Visual Studio, the solution file to use is "ncbi_cpp.sln" and the project is "omssacl.exe".

 

Q5.2: What is required to build OMSSA?

A: You need to build the core code for both the NCBI c++ toolkit, the requirements of which are detailed in this documentation. Note that building OMSSA can require a significant amount of work due to the framework necessary to build it on multiple platforms.

 

 

Write to the Help Desk | Disclaimer | Privacy statement | Accessibility |
NCBI Home NCBI Search NCBI SiteMap