Prospect on the Biowulf Linux CLuster

Prospect on Biowulf

PROSPECT (PROtein Structure Prediction and Evaluation Computer Toolkit) is a threading-based protein structure prediction system. PROSPECT is designed particularly for the recognization of the fold template whose sequence has insignificant homology to the target sequence. The system runs efficiently; the efficiency is achieved mainly by discovering and utilizing the "topological complexity" of a protein fold. The threading templates contain both protein chains (defined by FSSP non-redundant set) and compact domains (defined by the SCOP and CATH databases). The template lists are updated when new versions are made available. The threading output provides evaluation of compactness and SVM assessment of threading reliability.

The web interface

PROSPECT can be run most simply through the web interface. The job is run as a prospect_swarm job (see below), and typically finishes in a few minutes. Enabling Z-scoring will typically finish in less than an hour. A single protein sequence is threaded against one or all template databases with a minimum of options, and the threading results are returned to the user by email. The sequence must either be in FASTA or raw format, and the user must input an email address.

prospect_swarm jobs

PROSPECT can be run as an swarm job by using the command prospect_swarm. This breaks the job into multiple runs for a single sequence and finishes in a fraction of the time of the single threaded version. This command has the following input options:

Input sequence prefix (filename must end with a .seq suffix):

-seq 'file'

Threading template library:

-fssp, -scop, -cath or -all

Secondary structure options:

-phd or -freq

Threading method options:

-global, -global_local, -wp or -np

Z-scoring:

-zsc

Number of solutions displayed in final html file:

-nsol '#'

Here is an example running an input sequence (input.seq) against all template libraries, generating secondary structure from a PSI-BLAST sequence profile, using global alignment and enabling Z-scoring:

prospect_swarm -seq input -freq -global -zsc

Batch jobs

Threading jobs can be run as batch jobs from Biowulf using PBS (see User Guide for more details). Here is a simple script (script.sh) for using PSI-Blast to generate a sequence profile on the input sequence, threading the profile against all databases, sorting the output and then converting the output to an html file:

#!/bin/csh -f
#PBS -N prospect
#PBS -e prospect.err
#PBS -o prospect.log
cd $PBS_O_WORKDIR
setenv PROSPECT_PATH /usr/local/prospect

set path = ( $path /usr/local/prospect/bin )
setenv BLASTPGP_EXE /usr/local/blast/archive/blastpgp
setenv BLASTPGP_DB /fdb/blastdb/nr
get_chk_file input.seq
prospect_ssp.LINUX -chkfile input.seq.chk -p > input.seq.ss

read_chk.LINUX input.seq.chk > input.seq.freq
prospect.LINUX -phdfile input.seq.ss -freqfile input.seq.freq -all -o output.xml -ncpus 2
sortProspect.LINUX output.xml -s
convertProspect.LINUX output.xml -html > output.html

The script (script.sh) would then be submitted to the batch system using the command

qsub -l nodes=1 script.sh

Swarm jobs

Multiple threading jobs can be launched using the swarm command on Biobos (see swarm user guide for more details). Here are two simple scripts (ind.sh and command.sh) for threading multiple sequences (seq01.seq, seq02.seq, seq03.seq, seq04.seq, etc.) against the FSSP database. The results are then sorted by raw score, and the scores for the top 5 threading alignments are written as a table to the output.

The script ind.sh sets up the environment and executes the threading job for each seq name:

#!/bin/csh -f
setenv PROSPECT_PATH /usr/local/prospect
set path = ( $path /usr/local/prospect/bin )
setenv BLASTPGP_EXE /usr/local/blast/archive/blastpgp
setenv BLASTPGP_DB /fdb/blastdb/nr

prospect.LINUX -seqfile $1.seq -fssp -o $1.xml
sortProspect.LINUX $1.xml -r -top 5 > $.out

The script command.sh runs ind.sh for each seq name:

/data/userid/ind.sh seq01
/data/userid/ind.sh seq02
/data/userid/ind.sh seq03
/data/userid/ind.sh seq04
/data/userid/ind.sh seq05
/data/userid/ind.sh seq06
/data/userid/ind.sh seq07
/data/userid/ind.sh seq08
/data/userid/ind.sh seq09
/data/userid/ind.sh seq10
/data/userid/ind.sh seq11
/data/userid/ind.sh seq12
/data/userid/ind.sh seq13
/data/userid/ind.sh seq14
/data/userid/ind.sh seq15
/data/userid/ind.sh seq16

command.sh would then be submitted to swarm using the command

swarm -f command.sh

Interactive Jobs

PROSPECT can be run as an interactive job. However, it must be initiated from one of the nodes. To do this, first allocate a single node for running jobs:

qsub -I -l nodes=1

Then, set the environmental variables and path of the node:

setenv PROSPECT_PATH
/usr/local/prospect

set path = ( $path /usr/local/prospect/bin )
setenv BLASTPGP_EXE /usr/local/blast/archive/blastpgp
setenv BLASTPGP_DB /fdb/blastdb/nr

Now commands can be given directly from the prompt. A simple job would include generating a secondary structure prediction, followed by a threading run against the SCOP library:

prospect_ssp.LINUX -seqfile input.seq -p

(will generate input.ss, a PhD-style secondary structure prediction file)

prospect.LINUX -phdfile input.ss -scop -o output.xml -ncpus 2

Last, sort the results by raw score and replace the original output with the sorted output:

sortProspect.LINUX output.xml -r -s

Prospect Template Databases

Important Notes

PROSPECT sequence files must be either fasta or raw format. It will read other formats, but it will give bizarre results.
The threading output can be scored using "Z-score". The Z-score is the threading score in standard deviation unit relative to the average score of the threading score distribution of random sequences with the same amino acid composition and sequence length as a query sequence.
The structural templates and threading output is in XML format. Using the command convertProspect.LINUX will convert the output to HTML format.
Models can be built interactively using nest and prospect2pdb.pl from the alignment.

Available options for Prospect

Please see the PROSPECT web site for all available options.