GenBank, GenBank Raw (sequence only from a GenBank flat file), EMBL, Table, Fasta, Mase (= IG, Intelligenetics), NEXUS interleaved, NEXUS sequential, MEGA interleaved, MEGA sequential, Stockholm, Clustal, BLAST, RSF, Phylip interleaved, Phylip sequential, MSF, GCG, GDE, Raw, SLX, Pretty-print, and MacVector.
For descriptions of some common sequence formats, see Common Sequence Formats.
The same formats as above, except that GenBank, EMBL, MacVector, and BLAST are not supported.
The "Raw" format consists of pure sequence, either nucleotides or one-letter amino acids.
ACATGTGCGCGCGATTATCTATCGATGCTACGTAWhen this sequence is converted to a non-raw format it will be given the name "seq1". If Raw input consists of multiple lines, each line is interpreted as a separate sequence. Thus, the input
ACATGTGCGCGCGATTATCTATCGATGCTACGTA GCATGTGCACGCGATTATCTACCGATGCTACTTAwould produce the following fasta output:
>seq1 ACATGTGCGCGCGATTATCTATCGATGCTACGTA >seq2 GCATGTGCACGCGATTATCTACCGATGCTACTTATherefore if you are submitting a single raw sequence be sure it is on a single line.
Phylip files must begin with a line that looks like
3 78 ithat shows the number of sequences in the file (3), the number of characters in each sequence (78), and then the letter "i" or "s" which indicates whether the file is "interleaved" or "sequential" respectively. The format converter requires the i or s letters. The format converter program deals with only two essential data items, the sequence, and the sequence name. Thus, a complicated file format such as Nexus when converted to a simpler format such as table will lose all the associated information except the sequence name and the sequence. Converting a Nexus file like:
#NEXUS Begin data; Dimensions ntax=3 nchar=79; Format datatype=dna gap=-; Matrix 4axED43xco GGAGGCCCTACCTCAAGTAGTGACGCCCTACCTCCCGTTGGCTGTTTCCTCTTGCGTAGAACGCTACTTTCGGGCAACC 2bxMD2b2x1 CGCTGTTGATCACCAAATCGGAGGGCACCTA-----GGAACACAGCTCCTCATGGATCGAGAGTACTTTCTAACCGTGA 2bxMD2b9x1 CGCTGCCAAATACCGAGTCGGAAGGCATCTACGGTTGAGACACGGCTCCCCATGAACCGAGGGTATTTCCTAACCGTGG ; End;to fasta format would produce the following file:
>4axED43xco GGAGGCCCTACCTCAAGTAGTGACGCCCTACCTCCCGTTGGCTGTTTCCTCTTGCGTAGAACGCTACTTTCGGGCAACC >2bxMD2b2x1 CGCTGTTGATCACCAAATCGGAGGGCACCTA-----GGAACACAGCTCCTCATGGATCGAGAGTACTTTCTAACCGTGA >2bxMD2b9x1 CGCTGCCAAATACCGAGTCGGAAGGCATCTACGGTTGAGACACGGCTCCCCATGAACCGAGGGTATTTCCTAACCGTGGThe datatype (dna), number of taxa, etc. are not represented in the fasta file, only the names and sequences.