Scientific Supercomputing at the NIH

Velvet on Helix

Velvet is a de novo genomic assembler specially designed for short read sequencing technologies, such as Solexa or 454, developed by Daniel Zerbino and Ewan Birney at the EBI. [Velvet website]

Velvet currently takes in short read sequences, removes errors then produces high quality unique contigs. It then uses paired-end read and long read information, when available, to retrieve the repeated areas between contigs.

Sample session
User input in bold:

[user@helix spneu]$ /usr/local/velvet/velveth . 21 -shortPaired spneu.454.fasta
Reading FastA file spneu.454.fasta;
20859 sequences found
Done
20859 sequences in total.
Writing into readset file: ./Sequences
Done
Writing into roadmap file ./Roadmaps...
Inputting sequences...
Inputting sequence 0 / 20859
Done inputting sequences
Destroying splay table
Splay table destroyed

[user@helix spneu]$ /usr/local/velvet/velvetg . -cov_cutoff 5 -read_trkg yes -amos_file yes 
Reading roadmap file ./Roadmaps
20859 roadmaps reads
Creating insertion markers
Ordering insertion markers
Counting preNodes
59595 preNodes counted, creating them now
Adjusting marker info...
Connecting preNodes
Cleaning up memory
Concatenating preGraph
Concatenation...
Renumbering preNodes
Initial preNode count 59595
Destroyed 6995 preNodes
Concatenation over!
Done creating preGraph
Clipping short tips off preGraph
Concatenation...
Renumbering preNodes
Initial preNode count 52600
Destroyed 27660 preNodes
Concatenation over!
16467 tips cut off
24940 nodes left
Writing into pregraph file ./PreGraph...
Reading read set file ./Sequences;
20859 sequences found
Done
Reading pre-graph file ./PreGraph
Graph has 24940 nodes and 20859 sequences
Scanning pre-graph file ./PreGraph for k-mers
226029 kmers found
Threading through reads 0 / 20859
Correcting graph with cutoff 0.200000
Determining eligible starting points
Done listing starting nodes
Initializing todo lists
Done with initilization
Activating arc lookup table
Done activating arc lookup table
1000 nodes visited
2000 nodes visited
3000 nodes visited
4000 nodes visited
5000 nodes visited
6000 nodes visited
7000 nodes visited
8000 nodes visited
9000 nodes visited
10000 nodes visited
11000 nodes visited
12000 nodes visited
13000 nodes visited
14000 nodes visited
15000 nodes visited
16000 nodes visited
17000 nodes visited
18000 nodes visited
19000 nodes visited
20000 nodes visited
21000 nodes visited
22000 nodes visited
23000 nodes visited
24000 nodes visited
Concatenation...
Renumbering nodes
Initial node count 24940
Removed 10269 null nodes
Concatenation over!
Clipping short tips off graph, drastic
Concatenation...
Renumbering nodes
Initial node count 14671
Removed 10111 null nodes
Concatenation over!
4560 nodes left
Writing into graph file ./Graph2...
Concatenation...
Renumbering nodes
Initial node count 4560
Removed 4262 null nodes
Concatenation over!
Clipping short tips off graph, drastic
Concatenation...
Renumbering nodes
Initial node count 298
Removed 51 null nodes
Concatenation over!
247 nodes left
Final graph has 247 nodes and n50 of 965 max 2809
Writing into graph file ./LastGraph...
Writing into stats file ./stats.txt...
Writing into AMOS file ./velvet_asm.afg...
Final graph has 247 nodes and n50 of 965 max 2809

Summary of usage

velveth - simple hashing program

Usage:
./velveth directory hash_length {[-file_format][-read_type] filename}

        directory               : directory name for output files
        hash_length             : odd integer (if even, it will be decremented) <= 31 
                                 (if above, will be reduced)

File format options:
        -fasta
        -fastq
        -fasta.gz
        -fastq.gz
        -eland
        -gerald

Read type options:
        -short
        -shortPaired
        -short2
        -shortPaired2
        -long
        -longPaired

Output:
        directory/Roadmaps
        directory/Sequences
                [Both files are picked up by graph, so please leave them there]
velvetg - de Bruijn graph construction, error removal and repeat resolution

Usage:
./velvetg directory [options]

        directory                       : working directory name


Standard Options
-cov_cutoff <floating-point> removal of low coverage nodes AFTER tour bus (default: no removal)
-ins_length <integer> expected distance between two paired end reads (default: no read pairing)
-read_trkg <yes|no> tracking of short read positions in assembly (default: no tracking)
-min_contig_lgth <integer> minimum contig length exported to contigs.fa file (default: hash length * 2)
-amos_file <yes|no> export assembly to AMOS file (default: no export)
-exp_cov <floating point> expected coverage of unique regions (default: no long or paired-end read resolution)

Advanced options:
-ins_length2 <integer> expected distance between two paired-end reads in the second short-read dataset (default: no read pairing)
-ins_length_long <integer> expected distance between two long paired-end reads (default: no read pairing)
-ins_length*_sd <integer> est. standard deviation of respective dataset (default: 10% of corresponding length) [replace '*' by nothing, '2' or '_long' as necessary]
-max_branch_length <integer> maximum length in base pair of bubble (default: 100)
-max_indel_count <integer> maximum length difference allowed between the two branches of a bubble (default: 3)
-max_divergence <floating-point> maximum divergence rate between two branches in a bubble (default: 0.2)
-max_gap_count <integer> maximum number of gaps allowed in the alignment of the two branches of a bubble (default: 3)
-min_pair_count <integer> minimum number of paired end connections to justify the scaffolding of two long contigs (default: 10)
-max_coverage <floating point> removal of high coverage nodes AFTER tour bus (default: no removal)

Output:

    directory/contigs.fa        : fasta file of contigs longer than twice hash length
    directory/stats.txt         : stats file (tab-spaced) useful for determining 
                                               appropriate coverage cutoff
    directory/LastGraph         : special formatted file with all the information on 
                                               the final graph
    directory/velvet_asm.afg    : (if requested) AMOS compatible assembly file

Documentation

Velvet manual (PDF)

Typing 'velveth' or 'velvetg' with no parameters at the Helix prompt will print a brief description of use.

D.R. Zerbino and E. Birney. 2008. Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Research, 18: 821-829