Scientific Supercomputing at the NIH

Sequence Format Converters on Helix
There are several programs available to convert nucleotide and protein sequences from one format to another.

How to use

The EMBOSS 'seqret' and 'seqretsplit' tools are available on both Helix and Biowulf. Typically, users will be reformatting sequences on Helix. If a large number of sequences needs to be reformatted as part of a Biowulf batch job, the EMBOSS commands can be inserted into a batch script.

Sample run with seqret on the command line, to convert a Genbank sequence into Swissprot format. (user input in bold)

helix% emboss                         (initializes EMBOSS)
[...]
helix% seqret 
Reads and writes (returns) sequences
Input (gapped) sequence(s): a00006.gb_pat
output sequence(s) [a00006.fasta]: swissprot::a00006.swiss
helix% more a00006.swiss
ID   A00006     standard; DNA; UNC; 26 BP.
SQ   Sequence 26 BP; 5 A; 10 C; 8 G; 3 T; 0 other;
     CAGGCGCTCG ATCGATCGCG CCAACG                                            26
//
helix%

Sample session with seqret to convert a GCG-format sequence into Fasta format.

helix% seqret
Reads and writes (returns) sequences
Input (gapped) sequence(s): nuc.gcg
output sequence(s) [nuc.fasta]: 
helix%

Documentation