Velvet on Helix
Velvet is a de novo genomic assembler specially designed for short read sequencing technologies, such as Solexa or 454, developed by Daniel Zerbino and Ewan Birney at the EBI. [Velvet website]
Velvet currently takes in short read sequences, removes errors then produces high quality unique contigs. It then uses paired-end read and long read information, when available, to retrieve the repeated areas between contigs.
Sample session
User input in bold:
[user@helix spneu]$ /usr/local/velvet/velveth . 21 -shortPaired spneu.454.fasta Reading FastA file spneu.454.fasta; 20859 sequences found Done 20859 sequences in total. Writing into readset file: ./Sequences Done Writing into roadmap file ./Roadmaps... Inputting sequences... Inputting sequence 0 / 20859 Done inputting sequences Destroying splay table Splay table destroyed [user@helix spneu]$ /usr/local/velvet/velvetg . -cov_cutoff 5 -read_trkg yes -amos_file yes Reading roadmap file ./Roadmaps 20859 roadmaps reads Creating insertion markers Ordering insertion markers Counting preNodes 59595 preNodes counted, creating them now Adjusting marker info... Connecting preNodes Cleaning up memory Concatenating preGraph Concatenation... Renumbering preNodes Initial preNode count 59595 Destroyed 6995 preNodes Concatenation over! Done creating preGraph Clipping short tips off preGraph Concatenation... Renumbering preNodes Initial preNode count 52600 Destroyed 27660 preNodes Concatenation over! 16467 tips cut off 24940 nodes left Writing into pregraph file ./PreGraph... Reading read set file ./Sequences; 20859 sequences found Done Reading pre-graph file ./PreGraph Graph has 24940 nodes and 20859 sequences Scanning pre-graph file ./PreGraph for k-mers 226029 kmers found Threading through reads 0 / 20859 Correcting graph with cutoff 0.200000 Determining eligible starting points Done listing starting nodes Initializing todo lists Done with initilization Activating arc lookup table Done activating arc lookup table 1000 nodes visited 2000 nodes visited 3000 nodes visited 4000 nodes visited 5000 nodes visited 6000 nodes visited 7000 nodes visited 8000 nodes visited 9000 nodes visited 10000 nodes visited 11000 nodes visited 12000 nodes visited 13000 nodes visited 14000 nodes visited 15000 nodes visited 16000 nodes visited 17000 nodes visited 18000 nodes visited 19000 nodes visited 20000 nodes visited 21000 nodes visited 22000 nodes visited 23000 nodes visited 24000 nodes visited Concatenation... Renumbering nodes Initial node count 24940 Removed 10269 null nodes Concatenation over! Clipping short tips off graph, drastic Concatenation... Renumbering nodes Initial node count 14671 Removed 10111 null nodes Concatenation over! 4560 nodes left Writing into graph file ./Graph2... Concatenation... Renumbering nodes Initial node count 4560 Removed 4262 null nodes Concatenation over! Clipping short tips off graph, drastic Concatenation... Renumbering nodes Initial node count 298 Removed 51 null nodes Concatenation over! 247 nodes left Final graph has 247 nodes and n50 of 965 max 2809 Writing into graph file ./LastGraph... Writing into stats file ./stats.txt... Writing into AMOS file ./velvet_asm.afg... Final graph has 247 nodes and n50 of 965 max 2809
Summary of usage
velveth - simple hashing program Usage: ./velveth directory hash_length {[-file_format][-read_type] filename} directory : directory name for output files hash_length : odd integer (if even, it will be decremented) <= 31 (if above, will be reduced) File format options: -fasta -fastq -fasta.gz -fastq.gz -eland -gerald Read type options: -short -shortPaired -short2 -shortPaired2 -long -longPaired Output: directory/Roadmaps directory/Sequences [Both files are picked up by graph, so please leave them there]velvetg - de Bruijn graph construction, error removal and repeat resolution
Usage: ./velvetg directory [options] directory : working directory name
Standard Options | |
-cov_cutoff <floating-point> | removal of low coverage nodes AFTER tour bus (default: no removal) |
-ins_length <integer> | expected distance between two paired end reads (default: no read pairing) |
-read_trkg <yes|no> | tracking of short read positions in assembly (default: no tracking) |
-min_contig_lgth <integer> | minimum contig length exported to contigs.fa file (default: hash length * 2) |
-amos_file <yes|no> | export assembly to AMOS file (default: no export) |
-exp_cov <floating point> | expected coverage of unique regions (default: no long or paired-end read resolution) |
Advanced options: | |
-ins_length2 <integer> | expected distance between two paired-end reads in the second short-read dataset (default: no read pairing) |
-ins_length_long <integer> | expected distance between two long paired-end reads (default: no read pairing) |
-ins_length*_sd <integer> | est. standard deviation of respective dataset (default: 10% of corresponding length) [replace '*' by nothing, '2' or '_long' as necessary] |
-max_branch_length <integer> | maximum length in base pair of bubble (default: 100) |
-max_indel_count <integer> | maximum length difference allowed between the two branches of a bubble (default: 3) |
-max_divergence <floating-point> | maximum divergence rate between two branches in a bubble (default: 0.2) |
-max_gap_count <integer> | maximum number of gaps allowed in the alignment of the two branches of a bubble (default: 3) |
-min_pair_count <integer> | minimum number of paired end connections to justify the scaffolding of two long contigs (default: 10) |
-max_coverage <floating point> | removal of high coverage nodes AFTER tour bus (default: no removal) |
Output:
directory/contigs.fa : fasta file of contigs longer than twice hash length directory/stats.txt : stats file (tab-spaced) useful for determining appropriate coverage cutoff directory/LastGraph : special formatted file with all the information on the final graph directory/velvet_asm.afg : (if requested) AMOS compatible assembly file
Documentation
Velvet manual (PDF)
Typing 'velveth' or 'velvetg' with no parameters at the Helix prompt will print a brief description of use.
D.R. Zerbino and E. Birney. 2008. Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Research, 18: 821-829