A Quick Guide
This section gives a quick tutorial about the general usage of
PROSPECT (after the program is properly installed). Through reading
it, you should be able to start using the program in your research,
and you can learn other functions gradually by checking the details in
other parts of this manual.
We will use examples which can be found in the "demo" subdirectory
where PROSPECT is installed. More demo commands can be found in
the file "demo/readme". The expected results of running demo are available
at "demo/results".
Enviroment and Requirements
Because prospect is an XML based application, it requires libxml2
to parse it's template files. If this is not installed on your
system, you can download it from http://www.libxml.org
Prospect also need to be able to find it's related files. You
will need to set the PROSPECT_PATH enviromental varible to point to
the base of the installation, or pass it via the -prospect_path argument
on the command line.
Basic Threading
Please go to "demo/1onc" directory. For the easiest threading,
all you need is a sequence file, like the one "1onc.seq". The sequence file
is very flexible. It can be either in a standard FASTA format (see example 1) or in a flexible
format (see example 2).
If you have a linux machine, then just type from the terminal:
prospect.LINUX -seqfile 1onc.seq
PROSPECT will automatically thread the sequence against 3200+ templates
in the FSSP database. Depending on your computer and the size of the sequence,
this job may take less than 5 minutes or more than an hour. After the job
is finished, you will see the file:
1onc.seq.xml
To see these results in the order of raw score
sortProspect.LINUX 1onc.seq.xml
You can send these results to a file with the standard Unix redirection
commands
sortProspect.LINUX 1onc.seq.xml > 1onc.seq.sort
It should be noted that although it is computationally least expensive,
the threading result obtained this way may not be the best result. It
is generally recommended that you use the predictied secondary structure and
the sequence profile information for the best result. It is also known
that sorting by "z-score" rather than raw score generally produces better
result.
Using Secondary Structure Prediction in Threading (Recommended)
Using secondary structure prediction can often improve threading
performance, especially the alignment accuracy. Here we show an example
in the "demo/1bgc" directory. You will find a sequence file "1bgc.seq"
and a secondary structure prediction file "1bgc.ss".
The secondary structure prediction can be obtained from the
on-line server PHD
developed by Burkhard Rost. You can also run our program prospect_ssp.
prospect.LINUX -phdfile 1bgc.ss
To create a secondary structure file, run the program
prospect_ssp.LINUX -seqfile 1bgc.seq -p > 1bgc.seq.ss
For better results, obtain a Blast checkpoint file to use as a profile:
get_chk_file 1bgc.seq
Then get a secondary structure based on this:
prospect_ssp.LINUX -chkfile 1bgc.seq.chk
Using Evolutionary Information in Threading (Recommended)
Threading can be much improved by utilizing the evolutionary information.
prospect.LINUX -phdfile 1bgc.ss -freqfile 1bgc.freq
Calculating z-scores
The z-score is the threading score in standard deviation unit
relative to the average score of the threading score distribution of random
sequences with the same amino acid composition and sequence length as a query
sequence. In practice, the theading score distribution is estimated by the
repeated threadings between a template and a large number (>100) of randomly
shuffled query sequences. To calulate z-score, use "-reliab" option.
prospect.LINUX -phdfile 1bgc.ss -freqfile 1bgc.freq
-reliab
Warning: It takes much longer (approximately 100 times) than basic
threading. Use "-tfile" option to calculate z-scores for a small number
of selected templates, or Prospect Manager.
Prospect Manager
If you want to do threading with the least amount of effort,
try the Prospect
Manager, a GUI interface for the prospect tool suite.
|