When processing nucleotide or amino acid sequences a maximum of 500 sequences can be processed.
Highlighter takes nucleotide and amino acid alignments as input. The alignments can be in any one of the Common Sequence Formats. The input should be codon aligned if you wish to use the silent and non-silent statistics.
The highlighter tool for nucleotide sequences handles sequences of different size such that sequences shorter than the master sequence are extended with dashes ('-') to the same length as the master sequence and sequences longer will be trimmed to the same length as the master sequence.
The highlighter tool for amino acid sequences does not currently have this feature and instead will prompt the user to correct the data set.
If unchecked, each sequence is evaluated to be either of type nucleotide, amino acid or indeterminate.
If more than 2% of the characters are unambiguously amino acid codes [QEILFP] then the sequence is protein.
If more that 94% of the characters are ATGCUNRY it is a nucleotide sequence.
Else it is an ambiguous sequence.
This allows the user to submit sequences that only have amino acid characters in places of mutation and the
rest of the sequence positions are represented by e.g. dashes.
For example:
will result in regular output only if the 'Ignore alphabet validation' checkbox is checked.
Master T F Q P S S G G D L E I Seq1 - - M - - - - - - - - - Seq2 - - D - - - - - - D - -
This feature enables you to select the sequence(s) that will act as master.
If the box is not clicked, the top sequence will be taken as the master by default. When multiple masters are required while using the option match, the top n sequences will be taken as masters where n is a number specified by the user. However, when the change masters button is clicked, the sequences will be extracted from the input file and a check box will be displayed. Using these checkboxes, you can select the sequences you want to use as masters. All master sequences have to be of the same length.
NOTE: When there are multiple selections but only one master is needed, any one of the selections will be used as master.
This option enables you to find the nucleotides/amino acids in a query sequence that do not match with those in a SINGLE master sequence. The nucleotides/amino acids that do not match with the master are assigned a color. The example below describes mismatches in a nucleotide sequence:
= T = A = C = G
Consider the following sequences,
Master A G T T A G Query T A G C A G Result | | | |
In the above example, the query sequence differs from the master in the positions 1, 2, 3 and 4. Hence these changes in the query are indicated by a "|" in a color depending on which nucleotide is present in the given location in the query.
This selection will highlight APOBEC signatures and G->A conversions that are encountered in the sequences. The following illustrates an example of how this function works.
Master | G | T | G | C | G | T |
Query | A | A | A | G | A | C |
This option enables you to compare query sequences with the master and highlight transitions, transversions and A<->G transitions with the following colors:
= Transversions = C<->T Transitions = A<->G Transitions = Transition or transversion
Consider the following sequences,
Master A T G C A T M Query - A A G G C G Result | | | | | | |
In the above example, all transitions except the one in the 6th position is an A<->G transition and is hence marked in light blue. All the transversions are marked in pink. In the last position (7), M of the master represents A or C and the query has a G in the corresponding position, hence this could be either a transition or transversion and is marked in green.
This option enables you to compare query NUCLEOTIDE sequences with the master and highlight silent and non-silent mutations with the following colors.
= Silent = Non-silent Consider the following sequences,
Master ATG ACT AAT TAG Query ATG ACC GTT TAA Result | | | |
The tool converts the nucleotide sequences to the corresponding amino acid sequences and highlights silent and non-silent mutations. In the above example, the second codon (ACT) in the master encodes threonine, and the corresponding codon in the query (ACC) also codes for the same, hence this is shown as a silent mutation. In contrast, the 3rd codon in the master (AAT) codes for amino acid asparagine, while the corresponding codon in the query (GTT) codes for valine, hence this is shown as a non-silent mutation.
This tool uses SNAP to calculate the statistics. It only compares the Master sequence with the other sequences and does not compare all pairs of sequences. For a more detailed analysis of silent and non-silent mutations, please use SNAP.
This option enables you to identify the matching nucleotides/amino acids between a single or multiple masters and a query sequence. If the number of masters is entered as 2, the top 2 sequences in the file will be considered as master sequences.
Each of the masters is assigned a unique color and is matched to each of the query sequences. The nucleotide/amino acid matches in the query are highlighted in the color of the master that it matched. If a nucleotide/amino acid in the query matches more than one master, this match is ignored, and only unique matches are colored.
Consider the following example,
Master1 A T T G G C Master2 A G G C A T Query1 A G T T A G Result A | | T | G
In the above example, the G in position 2 of the query matches with master2 and is indicated by a green "|" in the respective position in the result. The query sequence also matches with the T of master1 in position 3 and this is indicated by the red "|". The query matches with both master1 and master2 in position 1 and hence this position is left uncolored as only unique matches are displayed in the result. With regards to positions 4 and 6, since there are no matches, this is treated as a polymorphic site and depending on the option chosen to label polymorphisms, this is either left uncolored or is colored black. For more info on labeling polymorphisms, see below.
Consider the following sequences,
Master1: ATTGATA Master2: ATTGTTA Query1 : ATTGCTA
All the sequences above are identical except for the nucleotides in the 5th position. While the master sequences have an A and T in their respective positions, Query1 has a C. By selecting the option to label polymorphisms, these mismatches will also be indicated in black, while they will be ignored if this option is not selected.
This option highlights successive matches with bars as shown below:
Master1 A T T G G C Master2 A G G C A T Query1 A T T T A G Result T | G
In the example, the first three successive matches are represented as a single long bar as shown when the 'Use bars to indicate successive matches' option is selected. If the option is not selected, the matches are represented as regular vertical lines as shown in position 5.
Match:
When a match is done with the option "treat gaps as character", the gaps are treated as a "fifth nucleotide" ("21st amino acid") and a gap in the query is matched with a gap in the master in the same position.
Consider the following example:
Master1 - T T G G - Master2 A G G C A - Query1 - G T T A - Result | | | T | G
In the above example, when "treat gaps as character" is selected, the gap in the 1st position of the query is matched with the gap in the first position of master 1. However, the gap in the 6th position is not taken into account and is ignored because it matches with more than one master. For more details on match see above.
Mismatch, Transition & transversion, and Silent & non-silent:
When any of the above options are run with the option "treat gaps as a character", a gap IN THE QUERY is highlighted
in gray for nucleotids and amino acids if there is no gap in the master sequence at the same position. If the "treat gaps as
character" option is not chosen, such a gap is ignored.
When the option "match IUPAC codes" is selected, the following IUPAC codes are also considered during a match or mismatch:
Character Meaning M A and C R A and G W A and T S C and G Y C and T K G and T B C and G and T H A and C and T V A and C and G D A and G and T N A or C or G or T ? Any state or nothing
When the ignore option is selected, the tool skips over the IUPAC code in the sequence and does not perform any comparison in that position.
In this case, the tool treats IUPAC codes as regular characters without using the nucleotides they stand for while comparing. For example, although R stands for A and G, while using this option it will match ONLY another R.
Match:
When IUPAC codes are included in the match, the codes are also matched based on the nucleotides they represent.
For example, if the master had an M in the 2nd position and the query had an R in the corresponding position,
then this is considered a match because M could be an A and so could R. Whereas, if the master had a C in the 2nd
position and the query had a D in the same position then this would not be a match because D includes everything
but C.
Consider the following example:
Master1 A M R W ? T G C ? ? - - - - ? Master2 M H N D ? Y ? T ? G - A A - T Query1 B T A C A T ? M ? T - - ? ? - Result | | | | | | | | | | | | | | |
In the above example, in the first position, the query matches with Master 2 because the B in the query matches with M in Master 2 since both codes can represent C. Position 2 shows a similar example. In position 3, A in the query matches both Master 1 and Master 2 hence this match is not shown. Looking at position 4, we find that the C in the query does not match with either W(A and T) or D(A and G and T). If the option label polymorphisms is selected, then this is labeled black, else it is ignored. To learn about labeling polymorphisms, see here. When a question mark is present in the query during a match, it is considered a match with all the masters and is ignored. Whereas, if it is present in a master and the query does not match with any other masters in the corresponding position, then this is considered a match and is shown. This is demonstrated in the example in positions 7, 9, 13, 14 and 15.
Mismatch, transitions & transversions and silent & non-silent:
When IUPAC codes are included in any of the above cases, a difference is shown only if there are no common nucleotides between the IUPAC codes. Consider the following example during mismatch:
Master R M A T G C - D ? ? Query A H W G G D A B A ? Result A H W | G | | B A ?
In the above example, the first three positions of the query match with the first three positions of the master and hence are not shown. Whereas at the 4th position, the T in the master does not match with the G in the query and this is shown in yellow. Similarly at position 6, the master has a C while the query has a D which represents the nucleotides A, G and T. Hence this is considered a mismatch.
When the "Sort sequences by similarity" option is chosen, the sequences are compared to the first sequence in the alignment file and are sorted according to their similarity with this sequence. The most similar sequence is placed at the top of the result graph and the least similar at the bottom. When there are multiple masters, the sequences are sorted according to their similarity with the first sequence in the alignment file.
When the "Sort sequences by tree" option is chosen, the sequences are sorted based on their evolutionary relationship. If the user does not supply a tree file, the program will generate one using the phylogenetic analysis tool PAUP.
When the "Do not sort" option is chosen, the sequences in the result set will appear in the same order they are in the supplied alignment file.
His | |
Asp, Glu | |
Lys, Asn, Gln, Arg | |
Met | |
Ile, Leu, Val | |
Phe, Trp, Tyr | |
Cys | |
Ala, Gly, Ser, Thr | |
Pro | |
Gap | |
Other |
Ala, Gly, Pro, Ser, Thr | |
His, Lys, Arg | |
Asp, Glu, Asn, Gln | |
Cys | |
Ile, Leu, Met, Val | |
Phe, Trp, Tyr | |
Gap | |
Other |
Ala, Phe, Ile, Leu, Met, Pro, Val, Trp | |
Cys, Gly, Asn, Gln, Ser, Thr, Tyr | |
Asp, Glu | |
His, Lys, Arg | |
Gap | |
Other |
Ala | Gly | Pro | Ser | ||||
Asp | Glu | Trp | Tyr | ||||
His | Lys | Arg | Ile | ||||
Leu | Met | Val | Asn | ||||
Gln | Thr | Phe | Cys | ||||
Gap | Other |