{ version = 1.30; (* of discan.p 2005 May 23}
(* begin module describe.discan *)
(*
name
discan: combine two feature files into one model
synopsis
discan(scanfeaturesa: in, scanfeaturesb: in, histog: in, discanp: in,
discanfeatures: out, dsout: out, output)
files
scanfeaturesa: the scanfeatures file for the first model,
from program scan.
scanfeaturesb: the scanfeatures file for the second model,
from program scan.
histog: output of genhis program, it is the distribution used to compute
the uncertainty due to various distances between the two models.
Histog must be in increments of 1 over the range.
To create the data file for genhis, start with a Delila instruction
file. Then use the malign program to get improved alignments. Use
malin to extract one of the alignments as a second Delila instruction
file. Then use the diffinst program to make the data file. Finally,
run genhis go get the histog.
discanp: parameters to control the program. The file must contain the
following parameters, one per line:
parameter version number, needs to be compatible with current program
version
lowhistog, highhistog (two integers): range of the histog distribution
to use, in the order lowest value to highest
ricutoff: total Ri cutoff, this value is compared to the individual
info of site 1 + the individual info of site 2 + the sample
correction - the -log2 of the distance probability.
singleA (character): If singleA is 'f' then filter out any feature
pairs that contain the same A coordinate that have lower information
than another pair. Note: singleA and singleB cannot be both 'f' at
the same time.
singleB (character): If singleB is 'f' then filter out any feature
pairs that contain the same B coordinate that have lower information
than another pair. Note: singleA and singleB cannot be both 'f' at
the same time.
discanfeatures: new features file to use with lister.
dsout: Table of the first location, second location, distribution and
total information. This can be the input (xyin) to the xyplo program.
output: messages to the user
description
This program is used to compare the binding patterns of 2 different
binding site models, it selects sites that are within a certain range of
each other and then adds their individual information together and
subtracts a distance based distribution probability value to determine the
new total information.
The theory is that the distances have a distribution. So one can assign
probabilities to each distance. One can compute the uncertainty of such a
distribution, so one can also compute the individual information
(surprisal) of each gap - it's just -log2(gap distance frequency) + small
sample correction. Ryan's discan program does this.
What I like about this is it combines the information of the parts
smoothly with the information about gap distance! There are *no*
arbitrary gap penalties or other arbitrary parameters! Of course, if the
model fails we are in big trouble ...
To make the distribution file (histog) and know which model should be A
and which should be B, use which every (ever zzz?) model was subtracted as
A and the model subtracted from as A. (hunh? zzz) For example, if your
disribution is negative, you probably subtracted the model further
downstream from the model upstream, so the downstream model would be A and
the upstream model would be B. With ribosome binding sites sd always came
before atg, atg was subtracted from sd, giving a negative distribution and
(the atg? zzz) was then assigned as the A model and sd was assigned as
the B model.
One can filter out cases having a common A or common B feature using
parameters singleA and singleB. However, the program will not allow one
to remove both at the same time. Consider this situation:
feature 5' 3' along sequence
1 B---A 5 bits
2 B------A 4 bits
3 B---A 3 bits
Given feature 1, if feature 2 is found next and removed by the A filter,
then feature 3 would incorrectly survive the B filter.
The B filter says to keep 2, remove 3.
The A filter says to keep 1, remove 2.
This is contradictory, since depending on the order of processing
different things can happen. For the order given above, 2 and 3 would
compete and 3 would loose. Then 2 would loose to 1. HOWEVER in a scan of
the complementary strand 1 would beat 2 and then 3 would pass through. So
the results would depend on the direction of the scans.
To avoid this inconsistancy, one is not allowed to do both filterings at
the same time.
examples
example discanp file:
1.02 version of discan that this parameter file is designed for.
-12 -6 range
0 total Ri cutoff
f f means filter out duplicates of the a features
n f means filter out duplicates of the b features
documentation
see also
scan.p, genhis.p, lister.p, discanp, xyplo.p, diffinst.p, malign.p,
malin.p
author
Ryan Kent Shultzaberger and Tom Schneider
bugs
Discan does not know how to deal with circular segments, but since this
reads in at the features level, it could be fixed if scan used internal
coordinates. However, scan can't produce internal coordinates because the
features might be used for a different book, as in "forced" features.
technical notes
*)
(* end module describe.discan *)
{This manual page was created by makman 1.44}
{created by htmlink 1.52}