{susanc,maoj}@helix.nih.gov
staff@helix.nih.gov
Helix Systems, CIT, NIH
Jan 24, 2007
---------------------------------------------------------------------- Welcome to the WISCONSIN PACKAGE Version 10.3-UNIX Installed on irix64 Copyright (c) 1982 - 2001, Accelrys Inc. A wholly owned subsidiary of Pharmacopeia, Inc. All rights reserved. Published research assisted by this software should cite: Wisconsin Package Version 10.2, Accelrys Inc., San Diego, CA ----------------------------------------------------------------------
GCG | EMBOSS
NHGRI NICHD Codon.nih.gov at NIMH NCI, Frederick Helix | NCI, Frederick |
How to access | GCG | EMBOSS
Web | GCG-Lite | http://helixweb.nih.gov/emboss-lite No username or password required Open to all NIH machines Now renamed to EMBOSS-Lite. Will transition to using EMBOSS programs
behind the scenes. | No username or password required Open to all NIH machines Web | SeqWeb (shut down in Dec 05) | Required Helix account and login/password Open to all NIH machines. EMBOSS full-function
geb interface at http://helixweb.nih.gov/emboss | No username or password required Open to all NIH machines Command-line | on Helix & Nimbus | (Helix account required) on Helix, Nimbus,
Doublehelix and the Biowulf cluster. (Helix account required)
| |
To run EMBOSS on the Helix/Nimbus/Doublehelix command-line:
If you know the accession number of your sequence, you can find it in EMBOSS.
If you don't know the accession number, search on the NCBI website (http://www.ncbi.nlm.nih.gov)
Demo: searching for 'fibronectin' sequences.
GCG-Lite: http://molbio.info.nih.gov/molbio/gcglite.
Finding sequences via NCBI Entrez: http://www.ncbi.nlm.nih.gov/
EMBOSS accepts
- EMBL
- Genbank
- SwissProt
- PIR
- GCG
- MSF
- Clustal
- Plain, raw
- and many more.
GCG: one sequence per GCG file
Input: only first sequence in a file is read.
Output: into multiple files, one per sequence
EMBOSS: one or more sequences per file
Input: all sequences are read and processed
Output: all sequences into one file, unless you use -ossingle.
Demo: Running an EMBOSS program with a GCG-format sequence.
GCG: fetch Genbank:Ax700501
GCG: map ax700501.gb_pat
EMBOSS: remap ax700501.gb_pat
All GCG-format sequences should work transparently in EMBOSS!
GCG Databases available: GenBank Release 156.0 (10/06) GenPept Release 156.0 (10/06) PROSITE Release 20.3 (01/07) Restriction Enzymes (REBASE) 701 (01/07) Pfam Release 21 (11/06) SWISS-PROT Release 51.4 + updates to 09/Jan/07 GP_New FROZEN on 23/Dec/06 (117,658 entries) GB_New FROZEN on 21/Dec/06 (3,210,047 entries)
EMBOSS Databases available: genbank Release 157 (18/Dec/06) genpept Release 157 (26/Dec/06) est Release 157 (18/Dec/06) refseqaa Release 21 (11/Jan/07) refseqnt Release 21 (11/Jan/07) PROSITE Release 20.3 (09/Jan/07) Restriction Enzymes (REBASE) 701 (30/Dec/07) Transfac Release 10.4 (15/Dec/06) prints Release 38_0 (21/Sep/05) uniprot Release 9.4 (09/Jan/07) gpnew 21/Jan/07, 215603 entries since 26/Dec/06 rel 157 gbnew 21/Jan/07, 1585034 entries since 18/Dec/06 rel 157
How to identify sequences in a database:
GCG: db:accession (e.g. GB_VI:HIM010487 or SWISSPROT:CCNT2_HUMAN)
EMBOSS: db:accesssion (e.g. genbank:AJ010487 or
uniprot:CCNT2_HUMAN)
Note that the database names are different, and only accession numbers
are indexed in EMBOSS.
How to use a database sequence in a program:
GCG: translate GB_VI:HIM010487
EMBOSS: transeq genbank:AJ010487
Demo:
GCG: fetch swissprot:cram_craab
EMBOSS: seqret uniprot:cram_craab (in default Fasta format)
EMBOSS: seqret -outseq=gb::cram_craab.gb uniprot:cram_craab (in
Genbank format)
GCG is able to access subsets of databases (e.g. gb_ba, gb_vi), or combinations of databases (GB+ = Genbank + EST). This is more complex in EMBOSS.
Old GCG 'figure' files will not be readable by EMBOSS programs.
Equivalent EMBOSS programs can produce Postscript or png files, or Xwindows graphic output (other options also possible). There is no EMBOSS equivalent to the GCG 'setplot' command; the output type can just be specified on the command line.
Sample EMBOSS session for the 'pepwheel' program (the equivalent to GCG's 'helicalwheel').
Postscript file produced by EMBOSS pepwheel.% pepwheel Shows protein sequences as helices Input protein sequence: cram_craab.swissprot Graph type [x11]: ps Created pepwheel.ps % ps2pdf pepwheel.ps pepwheel.pdf
Replaced by:
- GCG-Lite (name change!) running EMBOSS programs behind the
scenes.
- EMBOSS full function web interface (http://helixweb.nih.gov/emboss)
Demo of GCG-Lite running EMBOSS-Lite.
Demo of same program in EMBOSS web interface.
To find command-line options:
Answers:
More help:
Email the Helix staff at staff@helix.nih.gov
Call the Helix staff at 301-594-6248 (the NIH Helpdesk - ask for Helix
staff regarding a GCG or EMBOSS question).