Instructions for General Features
Common to Multiple ProteinProspector Programs


Purpose
This document provides instructions for features found across more than one program in the ProteinProspector package.

Instructions for features particular to an individual ProteinProspector Program

Contents of this document:


Search Times

Search times vary from a few seconds to a few minutes depending on the computer hardware ProteinProspector is running on, the size of the database being searched, the restrictiveness of the search parameters and the number of searches being simultaneously performed. runs on a 266 MHz Pentium II machine running Windows NT 4.0. runs on a Sun Microsystems Sparc 10 running SunOS 4.1.3. All searches taking longer than 15 minutes are automatically terminated. When two or more searches are being performed simultaneously the searches slow noticeably. In general faster searches result with more discriminating search parameters: single species, narrow intact protein MW range, 0 missed cleavages. For DNA database searches set the intact protein MW filter to All.

Searches will run as much as 2X faster if you are NOT using a web browser on the computer performing the search, but are instead communicating to the search server via a network. As much as 50% of one's CPU time can be allocated to merely keeping the stars shooting in the browser window while the search is being performed.


Stopping / Cancelling a Search

Prior to ProteinProspector 3.1 a submitted search request would run to completion even if the user clicked stop on his/her web browser. The end result of this was that if a user clicked stop, changed a parameter and resubmitted the search request each additional search became progressively slower because the server was running multiple searches.

Starting with ProteinProspector 3.1 some of the programs display messages such as:

Press stop on your browser if you wish to abort this MS-Fit search prematurely.

If you see such a message the search can be stopped and resubmitted without clogging up the server.


Saving Hits from one ProteinProspector program, searching them with another

Starting with ProteinProspector v 3.0 one ProteinProspector search program can serve as a pre-filter for another search program. To accomplish this the Hits (index numbers for matching database entries) from the first program are saved to a user specified file. This file is then retrieved by the second program, and only those matching database entries are searched by the second program. Since this operation requires disk space for the saved files, the Internet versions of ProteinProspector limit the user specified file to 3 possible filenames: lastres1, lastres2, lastres3. For ProteinProspector licensees there is no such limit.

The following programs can both save hits and search saved hits:

MS-Tag or MS-Seq cannot serve as a pre-filter for MS-Fit.

If MS-Tag is used in the Unknome mode as a pre-filter for MS-Edman then a list of possible peptide sequences is saved. MS-Edman should be used with Search Mode set to List of Sequences.


Databases

ProteinProspector programs search sequence databases which are located locally on the server running the programs. The actual files searched are FASTA formatted copies of the source database which contain minimal annotation. Search output typically contains a web-link into a fully annotated version of the source database for each entry matched.

ProteinProspector programs currently allow searching of the publicly available Genome and Proteome databases listed below. However, nearly any sequence database in a suitable FASTA format can be set up for use by contacting the administrator of a ProteinProspector server.

Protein Databases

DNA Databases Reasons to search particular databases:

Reasons NOT to search particular databases:

The local copy of the database being searched with the programs is subject to updating by the administrator of a ProteinProspector server.


Species Filtering

If you don't know the Latin taxonomic name for the species you're interested in try: NCBI Taxonomy Browser

Species limited searches in ProteinProspector programs are performed by means of preliminary filtering of a database according to the user designated species or collection of species. This species pre-filter is bypassed when the species is designated as All.

This species pre-filtering is imperfect because of the poor usage of taxonomy (standard species naming conventions) in the databases, AND the poorly standardized location of this information in the FASTA database formats used by ProteinProspector programs.

Users who desire additional/changed species filtering capability should direct their local ProteinProspector Server administrator to the instructions To Add/Change Species Filter. For the World Wide Web version of ProteinProspector please send email to: .

Species pre-filtering is implemented in ProteinProspector programs by correlating the user selected species name in the HTML form with the variety of pseudonyms for a particular species in the databases through behind the scenes access to a species alias list for all the databases used. This alias list is located on each ProteinProspector server in the directory.

Below is a list of the variety of pseudonyms for Mouse.
NCBInr dbEST Genpept Owl SwissProt

MOUSE
MUS MUSCULUS
MUS SP.
M. MUSCULUS
M.MUSCULUS
MOUSE
MUS DOMESTICUS
MUS MUSCULUS

MUS MUSCULUS

MOUSE
MUS MUSCULUS
MUS MUSCULUS (MOUSE)

MOUSE

Server Administrators can edit this alias list without requiring access to ProteinProspector source code. Note that while this mechanism of pseudonym correlation is a hassle it also allows for significant flexibility. For example an alias can be created that includes a collection of species i.e. mammals, eukaroytes, prokaryotes etc. Server administrators who create such alias collections are encouraged to send the modified parameter files to for inclusion in subsequent ProteinProspector releases.


Intact Protein MW Filtering

Intact protein MW limited searches in ProteinProspector programs are performed by means of preliminary filtering of a database according to the user designated intact protein MW. This pre-filter is bypassed when the MW range checkbox All is checked.

The intact protein MW pre-filtering is imperfect because sequences in protein databases often exist in pre, pro, and fragment forms. Sequences in DNA databases often exist as fragments (EST's) or as cDNA's.

ProteinProspector programs ALWAYS calculate the intact protein MW, according to the following constraints.

  1. Treat protein as uncharged.
  2. Use average mass scale.
  3. Treat amino acid C as unmodified.
  4. Treat amino acid X as leucine.
  5. Treat amino acid B as glutamic acid.
  6. Treat amino acid Z as glutamine.
  7. Ignore amino acids J, O, U.
Entries in DNA databases are subject to the following additional constraints:
  1. Translate in frame 1.
  2. Ignore stop codons.
  3. If translation of nucleotide N results in a codon that does not uniquely encode an amino acid, call it amino acid X.
  4. Ignore all nucleotides other than A, G, T, C, and N.


Intact Protein pI Filtering

Intact protein pI limited searches in ProteinProspector programs are performed by means of preliminary filtering of a database according to the user designated intact protein pI. This pre-filter is bypassed when the pI range checkbox All is checked.

The intact protein pI pre-filtering is imperfect because sequences in protein databases often exist in pre, pro, and fragment forms. Sequences in DNA databases often exist as fragments (EST's) or as cDNA's.

ProteinProspector programs ALWAYS calculate the intact protein pI, according to the following constraints.

  1. Treat amino acid C as unmodified.
  2. Treat amino acid X as leucine.
  3. Treat amino acid B as glutamic acid.
  4. Treat amino acid Z as glutamine.
  5. Ignore amino acids J, O, U.
Entries in DNA databases are subject to the following additional constraints:
  1. Translate in frame 1.
  2. Ignore stop codons.
  3. If translation of nucleotide N results in a codon that does not uniquely encode an amino acid, call it amino acid X.
  4. Ignore all nucleotides other than A, G, T, C, and N.

The pK values used to calculate the pI values can be modified by ProteinProspector server administrators. You must remake the database index files using FA-Index if you change the pK values.


Frame Translation in DNA databases

DNA databases can NOT be searched with mass spectrometry data from DNA samples. ProteinProspector programs perform translation of DNA sequences to protein sequences.

Frames 1, 2, and 3, represent translation of the database sequence from left to right beginning in positions 1, 2, or 3 respectively. Frames 4, 5, 6 represent translation of the complement of the database sequence from right to left beginning in positions 1, 2, or 3 respectively.

Frame translation in ProteinProspector programs can be designated in 1, -1, 3, -3 or 6 frame translation modes. Frame mode 1 considers only frame 1 described above whereas frame mode -1 considers only frame 4. Frame mode 3 considers only frames 1, 2 and 3 whereas frame mode -3 considers only frames 4, 5 and 6. Frame mode 6 considers all 6 frames. A user should select frame mode 6 unless he/she knows that the database being searched contains sequences exclusively cloned in one direction or contains known genes with sequences already in frame.

Since the capability of searching DNA databases was intended to use EST databases, translation initiation does not require a start codon. If a stop codon is encountered the polypeptide is terminated. Translation is then reinitialized and continued with the following codon, thus beginning a new open-reading frame. MS-Fit requires all matches to a particular database entry to belong not only to the same translational frame, but also to the same open reading frame. Users who feel any of these procedures are inappropriate or inadequate, are urged to contact . Implementation of these procedures was done with significant uncertainty as to optimal strategy.


Enzyme specificity / Missed cleavages

The termini of the matched peptides can be set to be consistent with the cleavage specificity of the enzyme used to generate the peptide. By selecting No enzyme (not available in MS-Fit or MS-Digest) the matched peptides have no constraint on their termini. Increasing the number of maximum number of missed cleavages allowed enables matching to sequences with uncleaved sites internal to the peptide.

The option for the non-existent enzyme Slymotrypsin was created as a means for allowing Chymotryptic cleavages in Trypsin digests. When using this choice it is important to increase the missed cleavages allowed. Increasing to 9 will result in only a marginal increase in the search time. It is possible to combine the rules for two or more enzymes by adding options to the Enzyme item on the HTML form. For example adding the option:

<OPTION> CNBr/Trypsin/Asp-N

would combine the cleavage rules for CNBr, Trypsin and Asp-N.

It is possible to mix N-terminal cleavage rules with C-terminal ones in this way.

ProteinProspector server administrators can edit the existing enzyme cleavage rules or add new ones.


General features of HTML links in program output

The links in program output are intended to easily facilitate user access to obvious sources of additional information about proteins or peptides matched or under study. Some of the default parameters of these links can be changed by ProteinProspector server administrators. Without access to source code, releases of ProteinProspector v. 2.0 and earlier only allow server administrators to
change the default parameters in the HTML links from the accession number

ProteinProspector v. 3.0 and later also allow server administrators to:
change the default parameters in the HTML links from the MS-Digest index number
change the default parameters in the HTML links from the peptide sequence
change the default parameters in the HTML links from the elemental composition


Link from the accession number in program output to an annotated remote database entry

The database accession number in the search results has an HTML link to retrieve the complete entry including comments from a remote database. In order for this link to be created the programs need to know the URL for the remote database. Users who desire links to different fully annotated databases, or who find links to a particular database to be defective should contact their local ProteinProspector server administrator. For the World Wide Web version of ProteinProspector please send email to: .

Server Administrators can change the default address of links from accession numbers in program output without requiring access to ProteinProspector source code. Those administrators who find improved options for links to publicly available databases are encouraged to send the modified parameter files to for inclusion in subsequent ProteinProspector releases.


Link from the MS-Digest index number in program output to MS-Digest

The MS-Digest index number in the search results has an HTML link to retrieve a listing of all the masses and sequences of peptides that can be produced by digesting the matched protein with the designated enzyme. If No enzyme was designated in the search parameters, then Trypsin is supplied in this HTML link. The number of missed cleavages is set to 2 unless a higher number was designated in the search parameters.

Without access to source code, releases of ProteinProspector v. 2.0 and earlier do not allow server administrators to change the default parameters associated with this HTML link.

In ProteinProspector v. 3.0 and later server administrators can change the HTML link from the MS-Digest index number in the search results.

If the MS-Digest number link marked Coverage Map in the MS-Fit detailed results is pressed then the protein display at the top of the MS-Digest report has the matching peptides highlighted.


Link from the peptide sequence in program output to MS-Product

The peptide sequence in the search results has an HTML link to MS-Product for retrieving a listing of the theoretical fragment-ions that may be formed in an MS/MS experiment. The default set of ion types supplied in this link corresponds to those expected to be formed in post-source decay (PSD) experiments.

Without access to source code, releases of ProteinProspector v. 2.0 and earlier do not allow server administrators to change the default parameters associated with this HTML link.

In ProteinProspector v. 3.0 and later server administrators can customize the HTML link from the peptide sequence in the search results.


Link from the elemental composition in program output to MS-Isotope

The elemental composition in the search results has an HTML link to MS-Isotope for retrieving a listing and visualization of the isotopic distribution corresponding to the composition.

In ProteinProspector v. 3.0 and later server administrators can customize the HTML link from the elemental composition in the search results.


Modified N or C Terminal Groups

Most ProteinProspector programs allow the peptide terminal groups to be modified from the defaults of hydrogen at the n terminus and free acid at the c terminus.

If the n terminal group chosen is PTC then any Lysines in the peptide are also modified.

Users who desire additional options for terminal groups should contact their local ProteinProspector server administrator. For the World Wide Web version of ProteinProspector please send email to: .

Server Administrators can add terminal groups without requiring access to ProteinProspector source code. Those administrators who add terminal groups are encouraged to send the modified parameter files to for inclusion in subsequent ProteinProspector releases.


Modified Cysteine Residues

ProteinProspector programs handle the amino acid cysteine in a different manner from any other amino acid. For each execution of a program all cysteines in a database are treated as though they are modified in the user designated way. More than one method of modification (mixing) canNOT generally be designated at the same time for a single search. There is one exception to this rule in the MS-Fit and MS-Digest programs where it is possible to consider Acrylamide Modified Cys in addition to the selected cysteine modification (Modifying Amino Acids).

Users who desire additional options for modification of cysteine residues should contact their local ProteinProspector server administrator. For the World Wide Web version of ProteinProspector please send email to: .

Server Administrators can add cysteine modification options without requiring access to ProteinProspector source code. Those administrators who add cysteine modification options are encouraged to send the modified parameter files to for inclusion in subsequent ProteinProspector releases.


Modifying Amino Acids

See also: Modified Cysteine residues.

Both MS-Fit and MS-Digest allow for a specialized set of modified amino acids:

Peptide N-terminal Gln to pyroGlu
Any instance of Glutamine at the N-terminus of a peptide (following digestion) is considered as either normal Gln or as pyro-glutamic acid.
Designation: Q -> q

Oxidation of M
Any instance of Methionine is considered as either normal Met or Met + oxygen.
Designation: M -> m

Protein N-terminus Acetylated
For any database entry with a Met at the N-terminus the N-terminal peptide is considered as either in its original form or in a form where the Met is removed and the next amino acid is acetylated. While this post-translational modification does not occur in bacteria, MS-Fit and MS-Digest don't know any better. Furthermore, if the database curators have removed the N-terminal Met from the sequence, then MS-Fit and MS-Digest will not apply the acetylation modification.

Acrylamide Modified Cys
Any instance of Cysteine is considered as either the Cysteine modification chosen on the Cys modified by: option or acrylamide modified Cys. This option would normally be used to consider each Cysteine as either unmodified or acrylamide modified.

User Defined 1
One of the considered modifications can be selected from a list of user defined modifications which a server administrator can add to. For example if Phosphorylation of S, T and Y is chosen from the list then any instance of Serine, Threonine, or Tyrosine is considered as either normal Ser, Thr, Tyr or phosphorylated Ser, Thr, Tyr.
Designation: S -> s, T -> t, Y -> y


User Specified Amino Acid

Some ProteinProspector programs allow the use of a user specified amino acid for which you must supply the elemental composition. To specify the user defined amino acid in a peptide or protein sequence use the letter u (lower case). The default elemental composition for the user defined amino acid is that of glycine.


Mass (m/z)

ProteinProspector programs expect the mass input values to represent the actual m/z values measured on a mass spectrometer. Thus protons - H+ (other charging agents are not currently allowed), need not be subtracted. However, input data that has had the mass of the protons subtracted can be used; simply designate the charge as 0.


Mass type

Monoisotopic: only the lowest common isotope for each element is used in the mass calculations 12C, 1H, 14N, 16O, 32S, 31P.

Average: All isotopes for each element are used and with their abundances reflecting their "normal" proportion in the biosphere. The isotope abundances can be changed by editing the elements.txt file.

Par(mi)Frag(av): Parent masses are calculated as monoisotopic and fragment masses are calculated as average. Note: for the purposes of searching, fragment masses are multiplied by a fudge factor and all calculations are done as monoisotopic. However, for the purposes of displaying search results, fragment mass errors are calculated as average mass (without using the fudge factor). This approach should be reasonable as the Par(mi)Frag(av) option should usually be chosen when the mass accuracy on fragment mass measurements is modest ( +/- 1000 ppm ), and the error in the fudge factor is negligible compared to the fragment mass accuracy.

Par(av)Frag(mi): Parent masses are calculated as average and fragment masses are calculated as monoisotopic. Note: for the purposes of searching, the parent mass is multiplied by a fudge factor and all calculations are done as monoisotopic. However for the purposes of displaying search results the parent mass error is calculated as the average mass (without using the fudge factor). This approach should be reasonable as the Par(av)Frag(mi) option should usually be chosen when the mass accuracy of the parent mass measurement is modest ( +/- 1000 ppm ), and the error in the fudge factor is negligible compared to the parent mass accuracy.


Charge (z)

ProteinProspector programs can handle multiply charged data from both positive and negative ion experiments. Simply specify the integer charge state corresponding to the m/z value. Absence of charge specification in the input defaults to a charge state of +1. Input data that has had the mass of the protons subtracted can be used; simply designate the charge as 0. The charge is used to convert the m/z value to an MH+ value for search purposes. Output will show the m/z value with the charge as a superscript.


Max. Reported Hits

This option is used to limit the maximum number of hits displayed. For example if the maximum number of reported hits is set to 50 and there are 100 hits then only the first 50 hits are displayed.


Sample ID (comment)

This option allows a user defined comment or sample identifier to be added the output.


Composition Ions

Searches can be restricted to matching sequences containing particular amino acid(s) by checking the appropriate boxes. This information can be derived from the masses of immonium and related low-mass ions or high-mass ions indicating side-chain losses from the parent ion. The programs do not actually use the mass values but instead filter the matched sequence for the presence of the designated amino acid(s).

In MS-Tag the masses of immonium and related low-mass ions can also be placed directly in the fragment-ion mass window. MS-Tag invokes the same rules as conveyed in the check box chart, and converts the masses to AA characters and filters matched sequences as above for presence of the described amino acid(s). ProteinProspector server administrators can control these immonium ion rules by editing the immonium.txt file.


Absent Amino Acids

Both MS-Comp and MS-Tag UnKnome consider the 20 naturally occurring amino acids as a default. If you know that your unknown peptide doesn't contain particular amino acids you can narrow the range of the search by excluding them. You might also wish to exclude either Leucine or Isoleucine.


Modified Amino Acids Possibly Present

Both MS-Comp and MS-Tag UnKnome consider the 20 naturally occurring amino acids as a default. They can also optionally include the following:

  • m - Oxidized Methionine
  • q - Pyroglutamic Acid
  • h - Homoserine Lactone
  • s - Phosphorylated Serine
  • t - Phosphorylated Threonine
  • y - Phosphorylated Tyrosine
  • u - A User Specified Amino Acid


    Instrument

    Some ProteinProspector parameters are specific to an instrument type. Server administrators can modify these parameters or add new instrument types by editing the instrument.txt file.