Entropy-Two | Entropy-One | Entropy Readme | Entropy Options | ||||||||
Variability
Variability is calculated as the entropy, or sum of P*ln(P) for each position. The difference in entropies between the two sets of sequences (background sequences and query sequences) is what Entropy-Two is looking for.
Randomization
In Entropy-Two, to test if the observed difference is statistically significant, the pooled input data at each position can by randomized with or without replacement. You can choose a limit, say 5 times out of 1000 randomizations, that you wish to have as a cut-off for your "conserved signature".
Amino Acid Class Equivalents
You have the option of using the straight amino acids for the calculations, or breaking them down by chemical similarity into the following groups:
In input sequence | In Entropy calculations | ||
D and E | a | ||
R and K | b | ||
I and V | i | ||
L and M | l | ||
F and W and Y | f | ||
N* and Q | n | ||
S and T | s |
All other amino acids use their original representations, for example, "C" remains "c".
N*: N-linked glycosylation sites are treated separately from the N and Q "n" grouping above, and are designated "g". For N-linked glycosylation site analysis, please use the N-Glycosite program.
Characters '*' and '-' in Sequences
These two characters are treated differently in our entropy calculation. The asterisk (*) symbol represents unknown or missing information, and is excluded from the entropy calculation. The dash (-) symbol represents insertion or deletion information, and is being considered in the calculation.