Information Content of Individual Genetic Sequences
@article{Schneider.Ri,
author = "T. D. Schneider",
title = "Information Content of Individual Genetic Sequences",
journal = "J. Theor. Biol.",
volume = "189",
number = "4",
pages = "427-441",
note = "\htmladdnormallink
{http://www.lecb.ncifcrf.gov/\~{}toms/paper/ri/}
{http://www.lecb.ncifcrf.gov/\~{}toms/paper/ri/}",
year = "1997"}
Html version of the paper.
PDF version of the paper.
PostScript version of the paper.
Material in this paper is covered by US patent 5867402.
Programs using these methods may not be redistributed or used
without a signed agreement with the National Institutes of Health.
Please contact us at:
http://www.lecb.ncifcrf.gov/~toms/contacts.html
|
This method of analyzing binding sites can be distinguished from other
methods by the following criteria.
-
Consensus sequences can be immediately rejected.
-
A variety of ad hoc methods are non-additive, these can be immediately
rejected.
Shannon chose his function to be addititve and it is the only
one that has this property.
-
Berg and von Hippel's (Stormo's) method
does not give results in bits.
If you flip a coin, according to this method,
you could get thousands of 'bits' of information.
Despite the claims,
it is not information theory and it does not properly connect
to thermodynamics because it confuses non-specific
binding states with specific binding states.
It also ignores the inequality in the Second Law of Thermodynamics.
-
Starting from Berg and von Hippel, if one
sets the genomic frequencies to equiprobabile
one gets
a method that is linearly proportional to Ri. These can be
rejected as not having a natural zero coordinate.
The zero coordinate corresponds to the Second Law of Thermodynamics.
This method, as with the original Berg and von Hippel method,
therefore must use an arbitrary cutoff.
-
Neural network
training methods assume that places
where we do not know anything are not sites.
This has been demonstrated to be wrong.
An example is
the missed Fis sites in the tgt/sec promoter.
Having set all these criteria, there is no other method.
Additional discussion is in
Measuring Molecular Information.
See also the companion paper:
Sequence walkers: a graphical method to display how binding proteins
interact with DNA or RNA sequences
For more infomation see:
Individual Information Theory and Sequence Walkers
Schneider Lab
origin: 1997 December 23
updated: 2006 Oct 04