HIV Databases HIV Databases home HIV Databases home
HIV sequence database



Shannon Entropy Options

  Entropy-Two   Entropy-One   Entropy Readme   Entropy Options  

Variability

Variability is calculated as the entropy, or sum of P*ln(P) for each position. The difference in entropies between the two sets of sequences (background sequences and query sequences) is what Entropy-Two is looking for.

Randomization

In Entropy-Two, to test if the observed difference is statistically significant, the pooled input data at each position can by randomized with or without replacement. You can choose a limit, say 5 times out of 1000 randomizations, that you wish to have as a cut-off for your "conserved signature".

Amino Acid Class Equivalents

You have the option of using the straight amino acids for the calculations, or breaking them down by chemical similarity into the following groups:
 In input sequenceIn Entropy calculations 
 D and Ea
 R and Kb
 I and Vi
 L and Ml
 F and W and Yf
 N* and Qn
 S and Ts
All other amino acids use their original representations, for example, "C" remains "c".
N*: N-linked glycosylation sites are treated separately from the N and Q "n" grouping above, and are designated "g". For N-linked glycosylation site analysis, please use the N-Glycosite program.

Characters '*' and '-' in Sequences

These two characters are treated differently in our entropy calculation. The asterisk (*) symbol represents unknown or missing information, and is excluded from the entropy calculation. The dash (-) symbol represents insertion or deletion information, and is being considered in the calculation.
last modified: Tue Nov 27 16:59 2007


Questions or comments? Contact us at seq-info@lanl.gov.