HCV Database
HCV sequence database
 


Please read an important announcement about the future of the HCV database here.

Format Converter Explanation

Input formats recognized by format converter:

GenBank, GenBank Raw (sequence only from a GenBank flat file), EMBL, Table, Fasta, Mase (= IG, Intelligenetics), NEXUS interleaved, NEXUS sequential, MEGA interleaved, MEGA sequential, Stockholm, Clustal, BLAST, RSF, Phylip interleaved, Phylip sequential, MSF, GCG, GDE, Raw, SLX, Pretty-print, and MacVector.

For descriptions of some common sequence formats, see Common Sequence Formats.

Output formats producible by format converter:

The same formats as above, except that GenBank, EMBL, MacVector, and BLAST are not supported.

Notes on sequence names

The "Raw" format consists of pure sequence, either nucleotides or one-letter amino acids.

ACATGTGCGCGCGATTATCTATCGATGCTACGTA
When this sequence is converted to a non-raw format it will be given the name "seq1". If Raw input consists of multiple lines, each line is interpreted as a separate sequence. Thus, the input
ACATGTGCGCGCGATTATCTATCGATGCTACGTA
GCATGTGCACGCGATTATCTACCGATGCTACTTA
would produce the following fasta output:
>seq1
ACATGTGCGCGCGATTATCTATCGATGCTACGTA
>seq2
GCATGTGCACGCGATTATCTACCGATGCTACTTA
Therefore if you are submitting a single raw sequence be sure it is on a single line.

Phylip files must begin with a line that looks like

3  78  i
that shows the number of sequences in the file (3), the number of characters in each sequence (78), and then the letter "i" or "s" which indicates whether the file is "interleaved" or "sequential" respectively. The format converter requires the i or s letters. The format converter program deals with only two essential data items, the sequence, and the sequence name. Thus, a complicated file format such as Nexus when converted to a simpler format such as table will lose all the associated information except the sequence name and the sequence. Converting a Nexus file like:
#NEXUS
Begin data;
	Dimensions ntax=3 nchar=79;
	Format datatype=dna gap=-;
	Matrix
4axED43xco GGAGGCCCTACCTCAAGTAGTGACGCCCTACCTCCCGTTGGCTGTTTCCTCTTGCGTAGAACGCTACTTTCGGGCAACC
2bxMD2b2x1 CGCTGTTGATCACCAAATCGGAGGGCACCTA-----GGAACACAGCTCCTCATGGATCGAGAGTACTTTCTAACCGTGA
2bxMD2b9x1 CGCTGCCAAATACCGAGTCGGAAGGCATCTACGGTTGAGACACGGCTCCCCATGAACCGAGGGTATTTCCTAACCGTGG
;
End;
to fasta format would produce the following file:
>4axED43xco
GGAGGCCCTACCTCAAGTAGTGACGCCCTACCTCCCGTTGGCTGTTTCCTCTTGCGTAGAACGCTACTTTCGGGCAACC
>2bxMD2b2x1
CGCTGTTGATCACCAAATCGGAGGGCACCTA-----GGAACACAGCTCCTCATGGATCGAGAGTACTTTCTAACCGTGA
>2bxMD2b9x1
CGCTGCCAAATACCGAGTCGGAAGGCATCTACGGTTGAGACACGGCTCCCCATGAACCGAGGGTATTTCCTAACCGTGG
The datatype (dna), number of taxa, etc. are not represented in the fasta file, only the names and sequences.

Alternative tools




Questions or comments? Contact us at hcv-info@lanl.gov