Instructions for features particular to an individual ProteinProspector Program
Contents of this document:
Searches will run as much as 2X faster if you are NOT using a web browser on the computer performing the search, but are instead communicating to the search server via a network. As much as 50% of one's CPU time can be allocated to merely keeping the stars shooting in the browser window while the search is being performed.
Prior to ProteinProspector 3.1 a submitted search request would run to completion even if the user clicked stop on his/her web browser. The end result of this was that if a user clicked stop, changed a parameter and resubmitted the search request each additional search became progressively slower because the server was running multiple searches.
Starting with ProteinProspector 3.1 some of the programs display messages such as:
Press stop on your browser if you wish to abort this MS-Fit search prematurely.
If you see such a message the search can be stopped and resubmitted without clogging up the server.
The following programs can both save hits and search saved hits:
MS-Tag or MS-Seq cannot serve as a pre-filter for MS-Fit.
If MS-Tag is used in the Unknome mode as a pre-filter for MS-Edman then a list of possible peptide sequences is saved. MS-Edman should be used with Search Mode set to List of Sequences.
ProteinProspector programs currently allow searching of the publicly available Genome and Proteome databases listed below. However, nearly any sequence database in a suitable FASTA format can be set up for use by contacting the administrator of a ProteinProspector server.
Protein Databases
Reasons NOT to search particular databases:
The local copy of the database being searched with the programs is subject to updating by the administrator of a ProteinProspector server.
Species limited searches in ProteinProspector programs are performed by means of preliminary filtering of a database according to the user designated species or collection of species. This species pre-filter is bypassed when the species is designated as All.
This species pre-filtering is imperfect because of the poor usage of taxonomy (standard species naming conventions) in the databases, AND the poorly standardized location of this information in the FASTA database formats used by ProteinProspector programs.
Users who desire additional/changed species filtering capability should direct their local ProteinProspector Server administrator to the instructions To Add/Change Species Filter. For the World Wide Web version of ProteinProspector please send email to: .
Species pre-filtering is implemented in ProteinProspector programs by correlating the user selected species name in the HTML form with the variety of pseudonyms for a particular species in the databases through behind the scenes access to a species alias list for all the databases used. This alias list is located on each ProteinProspector server in the directory.
Below is a list of the variety of pseudonyms for Mouse.
NCBInr | dbEST | Genpept | Owl | SwissProt |
---|---|---|---|---|
MOUSE MUS MUSCULUS MUS SP. |
M. MUSCULUS
M.MUSCULUS MOUSE MUS DOMESTICUS MUS MUSCULUS |
MUS MUSCULUS |
MOUSE MUS MUSCULUS MUS MUSCULUS (MOUSE) |
MOUSE |
Server Administrators can edit this alias list without requiring access to ProteinProspector source code. Note that while this mechanism of pseudonym correlation is a hassle it also allows for significant flexibility. For example an alias can be created that includes a collection of species i.e. mammals, eukaroytes, prokaryotes etc. Server administrators who create such alias collections are encouraged to send the modified parameter files to for inclusion in subsequent ProteinProspector releases.
The intact protein MW pre-filtering is imperfect because sequences in protein databases often exist in pre, pro, and fragment forms. Sequences in DNA databases often exist as fragments (EST's) or as cDNA's.
ProteinProspector programs ALWAYS calculate the intact protein MW, according to the
following constraints.
The intact protein pI pre-filtering is imperfect because sequences in protein databases often exist in pre, pro, and fragment forms. Sequences in DNA databases often exist as fragments (EST's) or as cDNA's.
ProteinProspector programs ALWAYS calculate the intact protein pI, according to the
following constraints.
The pK values used to calculate the pI values can be modified by ProteinProspector server administrators. You must remake the database index files using FA-Index if you change the pK values.
DNA databases can NOT be searched with mass spectrometry data from DNA samples. ProteinProspector programs perform translation of DNA sequences to protein sequences.
Frames 1, 2, and 3, represent translation of the database sequence from left to right beginning in positions 1, 2, or 3 respectively. Frames 4, 5, 6 represent translation of the complement of the database sequence from right to left beginning in positions 1, 2, or 3 respectively.
Frame translation in ProteinProspector programs can be designated in 1, -1, 3, -3 or 6 frame translation modes. Frame mode 1 considers only frame 1 described above whereas frame mode -1 considers only frame 4. Frame mode 3 considers only frames 1, 2 and 3 whereas frame mode -3 considers only frames 4, 5 and 6. Frame mode 6 considers all 6 frames. A user should select frame mode 6 unless he/she knows that the database being searched contains sequences exclusively cloned in one direction or contains known genes with sequences already in frame.
Since the capability of searching DNA databases was intended to use EST databases, translation initiation does not require a start codon. If a stop codon is encountered the polypeptide is terminated. Translation is then reinitialized and continued with the following codon, thus beginning a new open-reading frame. MS-Fit requires all matches to a particular database entry to belong not only to the same translational frame, but also to the same open reading frame. Users who feel any of these procedures are inappropriate or inadequate, are urged to contact . Implementation of these procedures was done with significant uncertainty as to optimal strategy.
The termini of the matched peptides can be set to be consistent with the cleavage specificity of the enzyme used to generate the peptide. By selecting No enzyme (not available in MS-Fit or MS-Digest) the matched peptides have no constraint on their termini. Increasing the number of maximum number of missed cleavages allowed enables matching to sequences with uncleaved sites internal to the peptide.
The option for the non-existent enzyme Slymotrypsin was created as a
means for allowing Chymotryptic cleavages in Trypsin digests. When using this choice
it is important to increase the missed cleavages allowed. Increasing to
9 will result in only a marginal increase in the search time.
It is possible to combine the rules for two or more enzymes by adding options to the
Enzyme item on the HTML form. For example adding the option:
<OPTION> CNBr/Trypsin/Asp-N
would combine the cleavage rules for CNBr, Trypsin and Asp-N.
It is possible to mix N-terminal cleavage rules with C-terminal ones in this way.
ProteinProspector server administrators can edit the existing enzyme cleavage rules or add new ones.
ProteinProspector v. 3.0 and later also allow server administrators to:
change the default parameters
in the HTML links from the MS-Digest index number
change the default parameters
in the HTML links from the peptide sequence
change the default parameters
in the HTML links from the elemental composition
Server Administrators can change the default address of links from accession numbers in program output without requiring access to ProteinProspector source code. Those administrators who find improved options for links to publicly available databases are encouraged to send the modified parameter files to for inclusion in subsequent ProteinProspector releases.
Without access to source code, releases of ProteinProspector v. 2.0 and earlier do not allow server administrators to change the default parameters associated with this HTML link.
In ProteinProspector v. 3.0 and later server administrators can change the HTML link from the MS-Digest index number in the search results.
If the MS-Digest number link marked Coverage Map in the MS-Fit detailed results is pressed then the protein display at the top of the MS-Digest report has the matching peptides highlighted.
Without access to source code, releases of ProteinProspector v. 2.0 and earlier do not allow server administrators to change the default parameters associated with this HTML link.
In ProteinProspector v. 3.0 and later server administrators can customize the HTML link from the peptide sequence in the search results.
In ProteinProspector v. 3.0 and later server administrators can customize the HTML link from the elemental composition in the search results.
If the n terminal group chosen is PTC then any Lysines in the peptide are also modified.
Users who desire additional options for terminal groups should contact their local ProteinProspector server administrator. For the World Wide Web version of ProteinProspector please send email to: .
Server Administrators can add terminal groups without requiring access to ProteinProspector source code. Those administrators who add terminal groups are encouraged to send the modified parameter files to for inclusion in subsequent ProteinProspector releases.
Users who desire additional options for modification of cysteine residues should contact their local ProteinProspector server administrator. For the World Wide Web version of ProteinProspector please send email to: .
Server Administrators can add cysteine modification options without requiring access to ProteinProspector source code. Those administrators who add cysteine modification options are encouraged to send the modified parameter files to for inclusion in subsequent ProteinProspector releases.
Both MS-Fit and MS-Digest allow for a specialized set of modified amino acids:
Peptide N-terminal Gln to pyroGlu
Any instance of Glutamine at the N-terminus of a peptide (following digestion) is
considered as either normal Gln or as pyro-glutamic acid.
Designation: Q -> q
Oxidation of M
Any instance of Methionine is considered as either normal Met or Met + oxygen.
Designation: M -> m
Protein N-terminus Acetylated
For any database entry with a Met at the N-terminus the N-terminal peptide is considered
as either in its original form or in a form where the Met is removed and the next amino
acid is acetylated. While this post-translational modification does not occur in bacteria,
MS-Fit and MS-Digest don't know any better. Furthermore, if the database curators have
removed the N-terminal Met from the sequence, then MS-Fit and MS-Digest will not
apply the acetylation modification.
Acrylamide Modified Cys
Any instance of Cysteine is considered as either the Cysteine modification chosen on the
Cys modified by: option or acrylamide modified Cys. This option would normally be
used to consider each Cysteine as either unmodified or acrylamide modified.
User Defined 1
One of the considered modifications can be selected from a list of user defined modifications
which a server administrator can add to. For example
if Phosphorylation of S, T and Y is chosen from the list then any instance of Serine,
Threonine, or Tyrosine is considered as either normal Ser, Thr, Tyr or phosphorylated Ser,
Thr, Tyr.
Designation: S -> s, T -> t, Y -> y
Average: All isotopes for each element are used and with their abundances reflecting their "normal" proportion in the biosphere. The isotope abundances can be changed by editing the elements.txt file.
Par(mi)Frag(av): Parent masses are calculated as monoisotopic and fragment masses are calculated as average. Note: for the purposes of searching, fragment masses are multiplied by a fudge factor and all calculations are done as monoisotopic. However, for the purposes of displaying search results, fragment mass errors are calculated as average mass (without using the fudge factor). This approach should be reasonable as the Par(mi)Frag(av) option should usually be chosen when the mass accuracy on fragment mass measurements is modest ( +/- 1000 ppm ), and the error in the fudge factor is negligible compared to the fragment mass accuracy.
Par(av)Frag(mi): Parent masses are calculated as average and fragment masses are calculated as monoisotopic. Note: for the purposes of searching, the parent mass is multiplied by a fudge factor and all calculations are done as monoisotopic. However for the purposes of displaying search results the parent mass error is calculated as the average mass (without using the fudge factor). This approach should be reasonable as the Par(av)Frag(mi) option should usually be chosen when the mass accuracy of the parent mass measurement is modest ( +/- 1000 ppm ), and the error in the fudge factor is negligible compared to the parent mass accuracy.
In MS-Tag the masses of immonium and related low-mass ions can also be placed directly in the fragment-ion mass window. MS-Tag invokes the same rules as conveyed in the check box chart, and converts the masses to AA characters and filters matched sequences as above for presence of the described amino acid(s). ProteinProspector server administrators can control these immonium ion rules by editing the immonium.txt file.