Scientific Supercomputing at the NIH

Phred, Phrap/Cross_match/Swat, Consed/Autofinish on Helix

[Programs location] [Phred/Cross Match Sample] [Phrap/Phred Sample] [Crossmatch Sample] [Consed Sample] [Documentation]

Phred software reads DNA sequencing trace files, calls bases, and assigns a quality value to each called base. The phred quality values have been thoroughly tested for both accuracy and power to discriminate between correct and incorrect base-calls. Phred can use the quality values to perform sequence trimming.

Phrap is a program for assembling shotgun DNA sequence data. Cross_match is a general purpose utility for comparing any two DNA sequence sets using a 'banded' version of swat. Swat is a program for searching one or more DNA or protein query sequences, or a query profile, against a sequence database, using an efficient implementation of the Smith-Waterman or Needleman-Wunsch algorithms with linear (affine) gap penalties.

Consed/Autofinish is a tool for viewing, editing, and finishing sequence assemblies created with phrap. Finishing capabilities include allowing the user to pick primers and templates, suggesting additional sequencing reactions to perform, and facilitating checking the accuracy of the assembly using digest and forward/reverse pair information.

Programs Location / Initiation

To run the programs:

For csh/tsh users, insert the following lines at the end of your .cshrc file:

setenv CONSED_HOME /usr/local/genome
set path=( /usr/local/genome/bin ${path} )

For bash/ksh/sh users, insert the following at the end of your .bashrc file:

CONSED_HOME=/usr/local/genome
PATH=/usr/local/genome/bin:$PATH
export CONSED_HOME PATH

Sample Sessions

Phred/Cross Match Sample Session

First copy sample files into user's area (replace 'user' below with your helix userID):

# cd /usr/local/consed
# cp -r 454_newbler align454reads align454reads_answer assembly_view autofinish solexa_example solexa_example_answer polyphred standard /home/user/consed/sample1/
# chmod -R a+w /home/user/consed/sample1/
# chown -R user /home/user/consed/sample1/
% cd /home/user/consed/sample1
% cp -r standard/ test/
% cd test

Delete all the files in phd_dir and edit_dir:

% rm phd_dir/* edit_dir/*
% cd edit_dir

Run phredPhrap by typing

% phredPhrap

A bunch of files appear in this directory. Please note, if you intend to use consed, you 'MUST' use this 'phredPhrap' perl script. Failure to use this script will result in many consed features not working correctly, including consed's autofinish function, user-defined consensus tags, tagging ALU and other repeats, and tagging vector sequence. Use the phredPhrap perl script.

you want to call bases from the chromat files in subdirectory "chromat_dir", use phrap to assemble the contigs, and run consed to edit/examine the contigs. In this case you must ask phred to create "phd" output files, which are required by consed:

% cd /home/user/consed/sample1/test
% phred -id chromat_dir -pd phd_dir

This causes phred to read the chromat files in "chromat_dir" and write the "phd" files to "phd_dir". Next it makes FASTA files from the "phd" files by running the phd2fasta program:

% phd2fasta -id phd_dir -os seqs_fasta -oq seqs_fasta.screen.qual

Subsequently it screens out the vector in the sequences in "seqs_fasta" using cross_match:

% cross_match seqs_fasta vector.seq -minmatch 12 -minscore 20 -screen > screen.out

which generates the screened sequence file "seqs_fasta.screen"

Phrap/Phred Sample Session

Follow above 'Phred Sample Session'.

Runs phrap to perform the sequence assembly as follows:

% phrap seqs_fasta.screen -new_ace > phrap.out

As another example, again you want to process the chromat files in subdirectory "chromat_dir", but now you want phred to write the base calls to a FASTA file named "seqs_fasta" and the base quality values to "seqs_fasta.qual". In this case you run phred with the options:

% phred -id chromat_dir -sa seqs_fasta -qa seqs_fasta.qual

Consed Sample session

The following demo can be found in README.17.0.txt.

Start X-windows application.C

opy sample files into user's area (replace 'user' below with your helix userID) if you haven't following instruction above:

# cd /usr/local/consed
# cp -r 454_newbler align454reads align454reads_answer assembly_view autofinish solexa_example solexa_example_answer polyphred standard /home/user/consed/sample1/
# chmod -R a+w /home/user/consed/sample1/
# chown -R user /home/user/consed/sample1/

ADDING SOLEXA READS

The programs are located under /usr/local/genome/path, add to user's path:

% cd /home/user/consed/sample1/solexa_example/edit_dir
% fasta2Ace.perl ref.fa
% addSolexaReads.perl ref.ace bustard_files.fof ref.fa

ADDING 454 READS

% cd /home/user/consed/sample1/align454reads/edit_dir
% fasta2Ace.perl reference.fa

Bring up Consed and double click on 'reference.ace.1', make sure your X-windows application is started:

% consed

Then double click on contig "myreference" to bring up the Aligned Reads Window. Scroll around a little and right click on a read or two to see the trace. The Aligned Reads window looks like this:

Close all windows to exit consed

454 READS (NEWBLER ASSEMBLY)

The Newbler Assembler and Consed work together. To see a Newbler assembly:

% cd /data/maoj/consed/sampleTEST/454_newbler/edit_dir
% consed

Double click on "454Contigs.ace.1" on top.

Double click on "contig00001" to bring up the Aligned Reads Window.

Using the thumb at the bottom, scroll from the far left of the contig all the way to the far right to get an idea of the assembly. (It is a very small one.)

In the Aligned Reads Window, scroll to position 230 and right click on the T in read EBE03TV01CI9BG.1-240 (which is the top read). A bunch of selections are displayed. Select 'Display traces for all reads':

RESTRICTION DIGEST

The sample file 'standard.fasta.screen.ace.1' is under '/home/user/consed/sample1/standard/edit_dir'.

Follow step 174 on README.17.0.txt

ADD NEW READS

% cd /home/user/consed/sample1/standard/edit_dir
% cp ../chromats_to_add/* ../chromat_dir

Restart consed again and use the original ace file standard.fasta.screen.ace.1. If it asks if you want to apply edits, just say 'no'.

On the Main Window, click on the Add New Reads button. There will appear a list of files ending with .fof. These are files that contain lists of chromatograms. Double click on 'reads_to_add.fof' (Accept the defaults for the other options in this window.) There should be lots of progress output in the xterm from which you started Consed. When it completes, there will be a Reads Added Window popup with a report of which reads were added. In this case, it should say that 9 reads were successfully added and list them.

ASSEMBLY VIEW

Consed can show you a bird's eye view of the Assembly using forward/reverse pair information, sequence match information, read depth, etc. We have a test database which shows its features.

Exit consed and type:

% cd /home/user/consed/sample1/assembly_view/edit_dir
% consed

Double click on "assembly_view.fasta.screen.ace.1"

In the Consed Main Window, click on the button "Assembly View" which is near the upper left corner of the window. The Assembly View window looks like this:

RUNNING CROSSMATCH FOR SEQUENCE MATCHES

Click on 'What to show', 'Sequence Matches'. The 'Which Sequence Matches to Show In Assembly View' window comes up. Click on the 'Run Crossmatch' button. Watch the action in the xterm. There should be several pages worth of output from crossmatch that scrolls by in the xterm. 3 orange pairs of curvy lines will appear in the Assembly View Window which is the same as you saw in the above window.

 

Documentation

http://www.phrap.org/phredphrapconsed.html

Disclaimer | Privacy | Accessibility | CIT | NIH | DHHS | USA.gov