The Rosetta++ software suite focuses on the prediction and design of protein structures, protein folding mechanisms, and protein-protein interactions. The Rosetta codes have been repeatedly successful in the Critical Assessment of Techniques for Protein Structure Prediction (CASP) competition as well as the CAPRI competition and have been modified to address additional aspects of protein design, docking and structure. |
Version
The default version is 2.3 (April 2008).
Older versions:- 2.2: /usr/local/rosetta2.2 (October 2007).
- 2.1.2: /usr/local/rosetta2.1.2 (July 2007)
- 2.1: /usr/local/rosetta2.1 (September 2006)
- 2.0: /usr/local/rosetta2.0 (July 2006)
Other Rosetta Documentation and Links
- Rosetta Design Group Wiki - Contains links to a user's guide and some very useful software
- Baker laboratory at the University of Washington
- Gray laboratory at Johns Hopkins University
- Rohl laboratory at University of Santa Cruz
- Kuhlman laboratory at North Carolina University at Chapel Hill
- Baker lab publications
- RosettaCommons - The official home page for Rosetta
- Robetta - An online server where you can wait for months until your job is finished
- RosettaDock Server - Online server for local docking refinement
- Function Site Prediction Server
- RosettaDesign Server
- Rosetta@Home - Donate your excess cycles to a good cause!
- Using Rosetta on the Biowulf Cluster (4 Dec 2007 Powerpoint Presentation)
Rosetta++:
Rosetta++ is called using the base executable rosetta. While the command can be directly called using a vast plethora of options, it is preferable to interface the rosetta executable through the following various modules and scripts.
Many of the commonly used programs and scripts are available in the shared directory /usr/local/rosetta/bin. Because of this, it is a good idea to add this directory to your path:
csh/tcsh:set path = ($path /usr/local/rosetta/bin)bash:PATH=$PATH:/usr/local/rosetta/bin
The Rosetta++ suite contains the following functionalities:
Performs de novo protein structure prediction.Identifies low free energy sequences for target protein backbones.Predicts the structure of a protein-protein complex from the individual structures of the monomer components.Incorporates NMR data into the basic Rosetta protocol to accelerate the process of NMR structure predictionGenerate a fragments databaseScore a structure with the Rosetta energy function
Supporting Programs and Scripts
Because of the complexity and enormity of input and output from Rosetta, there are a large number of supporting programs and scripts for streamlining certain tasks:
Distribute Rosetta jobs on the Biowulf cluster using rosetta_swarm_setupManipulate input and output filesEvaluating Rosetta outputCluster decoys and models
Run as a batch job
Create a batch input file, e.g. 'rosettaRun':
#PBS -N rosetta
#PBS -e rosetta.err
#PBS -o rosetta.log
cd $PBS_O_WORKDIR
rosetta_make_fragments abcde.fasta abcde.rdb abcde.jufo_ss abcde.phd
Submit this job using the PBS 'qsub' command. Example:
qsub -l nodes=1 rosettaRun
See here for more information about PBS.
rosetta_swarm_setup
Many of the Rosetta++ methods can generate a large number of decoys or intermediate files. In these cases, the method can be broken into a large number of independent jobs using swarm. The script /usr/local/bin/rosetta_swarm_setup has been created to automate this breakup and submission to swarm. The script accepts Rosetta commands and creates a series of swarm commands based on the series code and nstruct values. For more information, type 'rosetta_swarm_setup' at the Biowulf prompt.
Because of the huge number of files created by Rosetta, it is very easy to reach your quota limit before the run completes (typically greater than 10,000 decoys). In these cases, it is recommended to use the -scorefilter, -smart_scorefilter and -output_pdb_gz options. See here in the options section for more information about these and other options.
RosettaAbinitio
RosettaAbinitio is used to generate a set of model structures from an amino acid sequence. This is done in two steps:
- generate a set of decoy structures
- pull out representative structures by clustering
- rosettaAB.pl - generate decoy structures in a silent mode file (no structures written)
- cluster.pl - do a cluster analysis of decoy structures and write out representative structures
- extract.pl - write out specific decoy structure
A better way of generating decoys is with rosetta_swarm_setup -- HIGHLY RECOMMENDED!
Files you will need in your running directory:
|
Example command lines:
Generate 1000 decoy structures from a fasta file:
Do a cluster analysis on concatenated silentfiles (combined.out):
Extract a structure file for decoy number 50 from the silentfile combined.out:
By default, PDB files are written as C-alpha models. To generate full-atom models from C-alpha models, use RosettaDesign:
|
RosettaDesign
RosettaDesign can be run in several different modes. They include:
- repacking side chains on a fixed backbone
- redesigning on a fixed backbone
- redesigning with a flexible backbone
Example runs can be found in /usr/local/rosetta/RosettaDesign.
Files you will need in your running directory:
|
Example command lines:
Redesign using a resfile, output 3 structures each with a name that begins with test1:
Just repack a protein with extra chi1 rotamers:
Move the backbone and design. In this case the 3 standard arguments to rosetta must be used. these are:
Also,
To design with a flexible backbone the starting structure must have ideal bond lengths and angles. Use the following command to idealize a structure:
|
Troubleshooting
- The program seg faults immediately. Check if your computer has enough memory. 512 mb will often work but 1GB is preferable.
- You get the error message that max_res is exceeded. Increase max_res in param.cc and recompile. If the program now requires too much memory try lowering maxrot.
RosettaDock
An excellent tutorial on using RosettaDock is available here.
Most of the scripts for docking are available in /usr/local/rosetta/rosetta_scripts/docking. A better way of generating decoys is with rosetta_swarm_setup -- HIGHLY RECOMMENDED!
Files you will need in your running directory:
|
Example command lines:
Prepack structure prior to full atom run:
Generate 1000 decoys using full atom docking run:
Cluster the resulting decoys by RMSD:
|
RosettaNMR
For now, see the RosettaCommons page here.
RosettaFragments
Rosetta++ builds and refines protein structures on the basis of fragment libraries. The target sequence is broken into 3- and 9-amino acid segments. A library of fragments that represent the range of accessible local structures for all short segments of the target sequence are selected from a a database of known protein structures. Segments are matched with structural fragments on the basis of sequence profiles using PSIBLAST against a non-redundant Fasta sequence database (NCBI nr) and secondary structure predictions (PSIPRED, SAM-T02, JUFO, and PHD). Compact structures are then assembled by randomly combining these fragments, using a Monte Carlo simulated annealing search.
In the simplest case, generating a fragment library on Biowulf requires only a target sequence, and is executed by typing the following command:
rosetta_make_fragments abcde.fasta
The FASTA file name must either have a prefix of five characters plus the 'fasta' suffix, or the -id option must be given with a four character sequence/one character chain identifier (e.g., 1fld_).
The target sequence will be subjected to two rounds of PSIBLAST, and a secondary structure prediction will be calculated using PSIPRED and JUFO. For better profiling, secondary structure predictions from SAM-T02, and PHD can be included.
SAM predictions can be done in conjunction with PHD by using the script /usr/local/rosetta/rosetta_fragments/SAM-PHD.pl as a batch or interactive qsub job:
#!/bin/bash #PBS -N SAM-PHD #PBS -e SAM-PHD.err #PBS -o SAM-PHD.log cd $PBS_O_WORKDIR /usr/local/rosetta/rosetta_fragments/SAM-PHD.pl abcde.fasta
JUFO and PHD predictions must be done on their respective online servers and the results given in seperate files with the following naming format:
abcde.rdb - SAM-T02The rosetta_make_fragments script will then automatically recognize and incorporate the additional secondary structure predictions.
abcde.jufo_ss - JUFO
abcde.phd - PHD
PLEASE NOTE: The secondary structure prediction files must be written in a strict format. The formats of the secondary structure predictions are as follows:
abcde.phd (PHD prediction, FORTRAN format = (9x,a3,2x,a60) or (9x,a3,2x,i60)):
protein: query length 76 ....,....1....,....2....,....3....,....4....,....5....,....6 AA |MQIFVKTLTGKTITLEVEPSDTIENVKAKIQDKEGIPPDQQRLIFAGKQLEDGRTLSDYN| PHD | EEEEEE EEEEEE HHHHHHHHHHHHH EEEEEE E | Rel |959999179855788727822799999999985477735248765144116772211113| detail: prH-|000000000000000000035799999999986210122210100000100014344311 prE-|079999510126788851000000000000000001111258776432452100011243 prL-|920000479872110148854100000000012688756420012466447875543345 subset: SUB |LEEEEE.LLLLEEEEE.LL..HHHHHHHHHHHH.LLL.L..EEEE.....LLL.......|
abcde.rdb (SAM-T02, DSSP 3-value prediction in rdb format, FORTRAN format = (i8,1x,a1,3(1x,f8.3))):
Pos AA E H L 10N 1S 5N 5N 5N 0 M 0.035 0.002 0.964 1 Q 0.651 0.004 0.346 2 I 0.932 0.003 0.065 3 F 0.975 0.002 0.023 4 V 0.967 0.002 0.032 5 K 0.885 0.002 0.114 6 T 0.581 0.006 0.414 7 L 0.086 0.008 0.905 8 T 0.025 0.009 0.966 9 G 0.103 0.011 0.887
Scoring with Rosetta
In addition to functionalities associated with particular modules, the Rosetta binary can be used to 'score' structure according to the Rosetta energy function using the '-score' command line option. A typical use to score a single structure would use the command line:
rosetta -score -s [pdb_file] -scorefile [score_file] |
Where [pdb_file] is the structure to be scored and [score_file] is an output file where scoring terms will be written. To score a list of structures, use
rosetta -score -l [list_file] -scorefile [score_file] |
where [list_file] contains a list of pdb files, one per line.
The '-score' mode can also be used to add sidechains to a structure by specifying that a structure should be output:
rosetta -score -s [pdb_file] -scorefile [score_file] -nstruct 1 -fa_output |
where "-nstruct 1" indicates that a structure should be output and "-fa_output" indicates that the structure should have fullatom sidechain coordinates. If you want to keep any available sidechain coordinates on the input pdb file, add "-fa_input" to the command line.
File Manipulation
cat_silent.pl: concatenate silentfiles
changeChain.pl: change the chain id of a PDB
compose_score_silent.py: generate a silentfile from a set of PDBs
createLoop.pl: create a dummy structure from a sequence of amino acids
createTemplate.pl: create a homology model template from a FASTA file and a homologous structure
make_coords_file.py: generate .coords format from native pdb (for input to cluster_info_silent.out, see below)
molecule.exe: generate JUFO file and rename ligand atoms (with addhydrogens.inp, mdl2rosetta.inp, and pdb2mdl.inp)
pdb_fasta.pl: generate a FASTA from a PDB
pdb2tag.pl: rename a PDB or set of PDBs to their tag names (as shown in the silentfile)
reconstruct_PDB_by_index: generate PDBs from abinitio-format silentfile
renumberPDBandchains.pl: renumber the residues of a PDB sequentially, starting at 1
renumberPDBatoms.pl: renumber the atoms of a PDB sequentially, starting at 1
silentDock2pdb.pl: generate PDBs from a docking silentfile
Evaluation
TMalign: aligns structures based on CA-CA distances
VMD: X-Windows molecular graphics viewer
getColumn.pl: display silentfile and scorefile columns
gnuplot: graphically display data
histogram.pl: generate a quick histogram from STDIN data
Clustering
cluster.pl: automatic clustering of an abinitio-format silentfile
cluster_info_silent.out: fully configurable silentfile clustering
cluster_pdbs.pl: cluster a set of PDBs
cluster_variation.pl: find per-residue variation within a cluster
make_color_trees.py: make a dendrogram of the clusters
make_new_plot.py: make a contacts plot