Biowulf at the NIH
Rosetta++ on Biowulf

Rosetta Software Logo from Baker Laboratory The Rosetta++ software suite focuses on the prediction and design of protein structures, protein folding mechanisms, and protein-protein interactions. The Rosetta codes have been repeatedly successful in the Critical Assessment of Techniques for Protein Structure Prediction (CASP) competition as well as the CAPRI competition and have been modified to address additional aspects of protein design, docking and structure.

Version

The default version is 2.3 (April 2008).

Older versions:

Other Rosetta Documentation and Links

Rosetta++:

Rosetta++ is called using the base executable rosetta. While the command can be directly called using a vast plethora of options, it is preferable to interface the rosetta executable through the following various modules and scripts.

Many of the commonly used programs and scripts are available in the shared directory /usr/local/rosetta/bin. Because of this, it is a good idea to add this directory to your path:

csh/tcsh:
set path = ($path /usr/local/rosetta/bin)
bash:
PATH=$PATH:/usr/local/rosetta/bin

The Rosetta++ suite contains the following functionalities:

RosettaAbinitio

Performs de novo protein structure prediction.

RosettaDesign

Identifies low free energy sequences for target protein backbones.

RosettaDock

Predicts the structure of a protein-protein complex from the individual structures of the monomer components.

RosettaNMR

Incorporates NMR data into the basic Rosetta protocol to accelerate the process of NMR structure prediction

Fragments

Generate a fragments database

Scoring

Score a structure with the Rosetta energy function

Supporting Programs and Scripts

Because of the complexity and enormity of input and output from Rosetta, there are a large number of supporting programs and scripts for streamlining certain tasks:

Parallelization

Distribute Rosetta jobs on the Biowulf cluster using rosetta_swarm_setup

File Manipulation

Manipulate input and output files

Evaluation

Evaluating Rosetta output

Clustering

Cluster decoys and models

Run as a batch job

Create a batch input file, e.g. 'rosettaRun':

#!/bin/bash
#PBS -N rosetta
#PBS -e rosetta.err
#PBS -o rosetta.log
cd $PBS_O_WORKDIR
rosetta_make_fragments abcde.fasta abcde.rdb abcde.jufo_ss abcde.phd

Submit this job using the PBS 'qsub' command. Example:

qsub -l nodes=1 rosettaRun

See here for more information about PBS.

rosetta_swarm_setup

Many of the Rosetta++ methods can generate a large number of decoys or intermediate files. In these cases, the method can be broken into a large number of independent jobs using swarm. The script /usr/local/bin/rosetta_swarm_setup has been created to automate this breakup and submission to swarm. The script accepts Rosetta commands and creates a series of swarm commands based on the series code and nstruct values. For more information, type 'rosetta_swarm_setup' at the Biowulf prompt.

Because of the huge number of files created by Rosetta, it is very easy to reach your quota limit before the run completes (typically greater than 10,000 decoys). In these cases, it is recommended to use the -scorefilter, -smart_scorefilter and -output_pdb_gz options. See here in the options section for more information about these and other options.

RosettaAbinitio

RosettaAbinitio is used to generate a set of model structures from an amino acid sequence. This is done in two steps:

RosettaAbinitio is best run using scripts and executables from the directory /usr/local/rosetta/rosetta_scripts/abinitio/bin/. These include:

A better way of generating decoys is with rosetta_swarm_setup -- HIGHLY RECOMMENDED!

Files you will need in your running directory:

fasta file The name of the file must have the form "xxxxx.fasta" that agrees with the fragment names (see RosettaFragments).
fragment libraries (see RosettaFragments)

Example command lines:

Generate 1000 decoy structures from a fasta file:

rosetta_swarm_setup aa 1d3z _ -silent -nstruct 1000 >& swarm.com

swarm -f swarm.com

Do a cluster analysis on concatenated silentfiles (combined.out):

cat_silent.pl *.out >& combined.out

cluster.pl -silentfile combined.out -get_centers 10

Extract a structure file for decoy number 50 from the silentfile combined.out:

reconstruct_PDB_by_index combined.out 50

By default, PDB files are written as C-alpha models. To generate full-atom models from C-alpha models, use RosettaDesign:

rosetta -design -onlypack -s aa1d3z.out.cluster0.pdb

RosettaDesign

RosettaDesign can be run in several different modes. They include:

Example runs can be found in /usr/local/rosetta/RosettaDesign.

Files you will need in your running directory:

starting pdb structure The name of the file must have the form "xxxx.pdb" that agrees with the fragment names (see RosettaFragments)
paths.txt

This file specifies the location of input/output for RosettaDesign. A default paths.txt file is supplied with the rosetta source code (/usr/local/rosetta/rosetta++/paths.txt). Most lines can be left unchanged. Make sure the path for 'data files' points to your rosetta_database if running with a flexible backbone, make sure that the names for the fragment files are correctly defined.

resfile This file is optional. It is used to specify a subset of residues to redesign. Instructions for creating resfiles are included in the resfiles directory (/usr/local/rosetta/rosetta_scripts/resfiles).
fragment libraries (see RosettaFragments)

Example command lines:

Redesign using a resfile, output 3 structures each with a name that begins with test1:

rosetta -s 1ubq.pdb -design -fixbb -resfile restest -ndruns 3 -pdbout test1

Just repack a protein with extra chi1 rotamers:

rosetta -s 1ubq.pdb -design -onlypack -ex1

Move the backbone and design. In this case the 3 standard arguments to rosetta must be used. these are:

  • a two letter id code, for example "aa".
  • 4 character code name for the protein. This must agree with the name of the fragment libraries and a pdb file named "xxxx.pdb" that has the same number of residues as the fragments.
  • chain id, for example, "_" for none, or "A" for chain A.

Also,

  • your paths.txt file must point to the appropriate fragment files
  • the starting structure must be idealized, see below for a description on how to idealize a structure

    rosetta aa 1hz5 _ -s 1hz5_idl.pdb -design -mvbb

To design with a flexible backbone the starting structure must have ideal bond lengths and angles. Use the following command to idealize a structure:

rosetta -s 1ubq.pdb -idealize -fa_input

Troubleshooting

RosettaDock

An excellent tutorial on using RosettaDock is available here.

Most of the scripts for docking are available in /usr/local/rosetta/rosetta_scripts/docking. A better way of generating decoys is with rosetta_swarm_setup -- HIGHLY RECOMMENDED!

Files you will need in your running directory:

pdb file The name of the file must have the form "xxxx.pdb", where xxxx.pdb is a file holding both partners in the initial docking model.

Example command lines:

Prepack structure prior to full atom run:

rosetta aa test _ -dock -prepack_rtmin -ex1 -ex2 -s test >& RosettaPPK.log

Generate 1000 decoys using full atom docking run:

rosetta_swarm_setup aa test _ -s test.ppk.pdb -dock_mcm -dock_rtmin -ex1 -ex2 -nstruct 1000 >& swarm.com

swarm -f swarm.com

Cluster the resulting decoys by RMSD:

cluster_pdbs.pl *_00*.pdb

RosettaNMR

For now, see the RosettaCommons page here.

RosettaFragments

Rosetta++ builds and refines protein structures on the basis of fragment libraries. The target sequence is broken into 3- and 9-amino acid segments. A library of fragments that represent the range of accessible local structures for all short segments of the target sequence are selected from a a database of known protein structures. Segments are matched with structural fragments on the basis of sequence profiles using PSIBLAST against a non-redundant Fasta sequence database (NCBI nr) and secondary structure predictions (PSIPRED, SAM-T02, JUFO, and PHD). Compact structures are then assembled by randomly combining these fragments, using a Monte Carlo simulated annealing search.

In the simplest case, generating a fragment library on Biowulf requires only a target sequence, and is executed by typing the following command:

rosetta_make_fragments abcde.fasta

The FASTA file name must either have a prefix of five characters plus the 'fasta' suffix, or the -id option must be given with a four character sequence/one character chain identifier (e.g., 1fld_).

The target sequence will be subjected to two rounds of PSIBLAST, and a secondary structure prediction will be calculated using PSIPRED and JUFO. For better profiling, secondary structure predictions from SAM-T02, and PHD can be included.

SAM predictions can be done in conjunction with PHD by using the script /usr/local/rosetta/rosetta_fragments/SAM-PHD.pl as a batch or interactive qsub job:

#!/bin/bash
#PBS -N SAM-PHD
#PBS -e SAM-PHD.err
#PBS -o SAM-PHD.log
cd $PBS_O_WORKDIR
/usr/local/rosetta/rosetta_fragments/SAM-PHD.pl abcde.fasta

JUFO and PHD predictions must be done on their respective online servers and the results given in seperate files with the following naming format:

abcde.rdb - SAM-T02
abcde.jufo_ss - JUFO
abcde.phd - PHD
The rosetta_make_fragments script will then automatically recognize and incorporate the additional secondary structure predictions.

PLEASE NOTE: The secondary structure prediction files must be written in a strict format. The formats of the secondary structure predictions are as follows:

abcde.phd (PHD prediction, FORTRAN format = (9x,a3,2x,a60) or (9x,a3,2x,i60)):

     protein:       query          length       76


              ....,....1....,....2....,....3....,....4....,....5....,....6
         AA  |MQIFVKTLTGKTITLEVEPSDTIENVKAKIQDKEGIPPDQQRLIFAGKQLEDGRTLSDYN|
         PHD | EEEEEE    EEEEEE   HHHHHHHHHHHHH       EEEEEE   E          |
         Rel |959999179855788727822799999999985477735248765144116772211113|
 detail:
         prH-|000000000000000000035799999999986210122210100000100014344311
         prE-|079999510126788851000000000000000001111258776432452100011243
         prL-|920000479872110148854100000000012688756420012466447875543345
 subset: SUB |LEEEEE.LLLLEEEEE.LL..HHHHHHHHHHHH.LLL.L..EEEE.....LLL.......|

abcde.rdb (SAM-T02, DSSP 3-value prediction in rdb format, FORTRAN format = (i8,1x,a1,3(1x,f8.3))):

Pos AA E H L
10N 1S 5N 5N 5N
       0 M    0.035    0.002    0.964
       1 Q    0.651    0.004    0.346
       2 I    0.932    0.003    0.065
       3 F    0.975    0.002    0.023
       4 V    0.967    0.002    0.032
       5 K    0.885    0.002    0.114
       6 T    0.581    0.006    0.414
       7 L    0.086    0.008    0.905
       8 T    0.025    0.009    0.966
       9 G    0.103    0.011    0.887

Scoring with Rosetta

In addition to functionalities associated with particular modules, the Rosetta binary can be used to 'score' structure according to the Rosetta energy function using the '-score' command line option. A typical use to score a single structure would use the command line:

rosetta -score -s [pdb_file] -scorefile [score_file]

Where [pdb_file] is the structure to be scored and [score_file] is an output file where scoring terms will be written. To score a list of structures, use

rosetta -score -l [list_file] -scorefile [score_file]

where [list_file] contains a list of pdb files, one per line.

The '-score' mode can also be used to add sidechains to a structure by specifying that a structure should be output:

rosetta -score -s [pdb_file] -scorefile [score_file] -nstruct 1 -fa_output

where "-nstruct 1" indicates that a structure should be output and "-fa_output" indicates that the structure should have fullatom sidechain coordinates. If you want to keep any available sidechain coordinates on the input pdb file, add "-fa_input" to the command line.

File Manipulation

cat_silent.pl: concatenate silentfiles
changeChain.pl: change the chain id of a PDB
compose_score_silent.py: generate a silentfile from a set of PDBs
createLoop.pl: create a dummy structure from a sequence of amino acids
createTemplate.pl: create a homology model template from a FASTA file and a homologous structure
make_coords_file.py: generate .coords format from native pdb (for input to cluster_info_silent.out, see below)
molecule.exe: generate JUFO file and rename ligand atoms (with addhydrogens.inp, mdl2rosetta.inp, and pdb2mdl.inp)
pdb_fasta.pl: generate a FASTA from a PDB
pdb2tag.pl: rename a PDB or set of PDBs to their tag names (as shown in the silentfile)
reconstruct_PDB_by_index: generate PDBs from abinitio-format silentfile
renumberPDBandchains.pl: renumber the residues of a PDB sequentially, starting at 1
renumberPDBatoms.pl: renumber the atoms of a PDB sequentially, starting at 1
silentDock2pdb.pl: generate PDBs from a docking silentfile

Evaluation

TMalign: aligns structures based on CA-CA distances
VMD: X-Windows molecular graphics viewer
getColumn.pl: display silentfile and scorefile columns
gnuplot: graphically display data
histogram.pl: generate a quick histogram from STDIN data

Clustering

cluster.pl: automatic clustering of an abinitio-format silentfile
cluster_info_silent.out: fully configurable silentfile clustering
cluster_pdbs.pl: cluster a set of PDBs
cluster_variation.pl: find per-residue variation within a cluster
make_color_trees.py: make a dendrogram of the clusters
make_new_plot.py: make a contacts plot