EMBOSS on Helix
EMBOSS
(The European Molecular Biology Open Software Suite) is an open-source software analysis
package specially developed for the needs of the molecular biology user
community. Emboss is composed of over 100 high-quality applications.
EMBOSS website.
EMBOSS can read and write sequences in all the common sequence formats, such as EMBL, Genbank, Fasta, SwissProt, PIR, GCG, MSF, Clustal and raw. Thus, when using EMBOSS, sequences do not have to be converted from one format to another.
Some of the areas covered by EMBOSS programs:
- Sequence Alignment
- Rapid database searching with sequence patterns
- Protein motif identification, including domain analysis
- Nucleotide sequence pattern analysis, for example to identify CpG islands or repeats
- Codon usage analysis for small genomes
- Rapid identification of sequence patterns in large scale sequence sets
- Presentation tools for publication
- And much more
Version
When you type 'emboss' to initialize EMBOSS, a header will be printed that will list the version of EMBOSS, the available databases, and the update status of each database.How to Use
- For short infrequent jobs, use the EMBOSS web interface on Helixweb.
- For longer jobs or larger numbers of sequences, use EMBOSS on Helix. Type 'emboss' at the command prompt (once per session) to initialize EMBOSS, then the name of the EMBOSS application:
- For large-scale usage (1000s of sequences), EMBOSS is also available on Biowulf
A more detailed sample session is at the end of this page.helix% emboss helix% application name (e.g. seqret)
Note: For sequence editing, we recommend the Seaview editor. RSF multiple sequence alignment files (from the GCG program) can be converted to MSF files and viewed/edited via Seaview. See the Seaview page for more information.
Documentation
- EMBOSS online documentation -- documentation for all EMBOSS programs.
- EMBOSS program groups by function.
- The online tutorial maintained by David Martin at the EMBOSS sourceforge site.
- Equivalent programs in Emboss & GCG to help users migrating from GCG to EMBOSS.
- List and update status of all EMBOSS databases on the Helix Systems.
- Type 'wossname' to search for programs by keyword (e.g. 'wossname restriction' to search for programs related to restriction enzymes.
- Type 'showdb' to display available databases
- Type 'tfm' to displays a program's help documentation manual (e.g. 'tfm digest')
- Type 'program -h' for help in command line qualifiers (e.g. 'pepcoil -h')
Sample Session
(User input in bold):helix% emboss ************************************************************************ Welcome to EMBOSS 4.0.0 ************************************************************************ Databases available: genbank Release 167 (19/Aug/08) genpept Release 167 (27/Aug/08) est Release 167 (19/Aug/08) refseqaa Release 30 (11/Jul/08) refseqnt Release 30 (11/Jul/08) PROSITE Release 20.36 (22/Jul/08) Restriction Enzymes (REBASE) 809 (29/Aug/08) Transfac Release 11.4 (14/Dec/07) prints Release 38_1 (24/Oct/07) uniprot Release 14.0 (22/Jul/08) allnt including genbank,est,refseqnt,gbnew allaa including genpept,uniprot,refseqaa,gpnew gpnew 01/Sep/08, 61516 entries since 19/Aug/08 rel 167 gbnew 31/Aug/08, 403077 entries since 19/Aug/08 rel 167 Type 'wossname keyword' to find a program Type 'showdb' to display available databases Type 'tfm programname' to display the program help Type 'programname -help' to list command-line options EMBOSS Web Interface at NIH: http://helixweb.nih.gov/emboss/ HELP! Helix Staff: 301-594-6248 or email: staff@helix.nih.gov ********************************************************************* helix% needle maize_hb.fas rice_hb.fas Needleman-Wunsch global alignment. Gap opening penalty [10.0]: Gap extension penalty [0.5]: Output alignment [af291052.needle]: helix% more af291052.needle ######################################## # Program: needle # Rundate: Mon Jan 08 2007 15:59:43 # Commandline: needle # [-asequence] maize_hb.fas # [-bsequence] rice_hb.fas # Align_format: srspair # Report_file: af291052.needle ######################################## #======================================= # # Aligned_sequences: 2 # 1: AF291052.1 # 2: OSU76029 # Matrix: EDNAFULL # Gap_penalty: 10.0 # Extend_penalty: 0.5 # # Length: 958 # Identity: 607/958 (63.4%) # Similarity: 609/958 (63.6%) # Gaps: 207/958 (21.6%) # Score: 2034.5 # # #======================================= AF291052.1 1 ATGGCACTCGCGGAGG---CCGACGACGGCGCGGTGGTCTTCGGCGAGGA 47 |||||.||||.||||| .|.|.|.||..||||||..||||.||||||| OSU76029 1 ATGGCTCTCGTGGAGGATAACAATGCCGTAGCGGTGAGCTTCAGCGAGGA 50 AF291052.1 48 GCAGGAGGCGCTGGTGCTCAAGTCGTGGGCCGTCATGAAGAAGGACGCCG 97 ||||||||||||||||||||||||.|||||..||.||||||||||..||| OSU76029 51 GCAGGAGGCGCTGGTGCTCAAGTCATGGGCGATCTTGAAGAAGGATTCCG 100 AF291052.1 98 CCAACCTGGGCCTCCGCTTCTTCCTCAAGTAAGTACGTTTCCGTGCTACA 147 ||||..|.|.|||||||||||||.|.|||||.|||| .|.||||.| OSU76029 101 CCAATATTGCCCTCCGCTTCTTCTTGAAGTATGTAC--ATGCGTGTT--- 145 AF291052.1 148 CACTGCC-----------TGCG----CACGTGCGCTTGGGTT------GC 176 |||.|| |||| || |.|.||||||| || OSU76029 146 -ACTACCATTTCTCTTTTTGCGGAATCA---GAGATTGGGTTTGTGAAGC 191 AF291052.1 177 ACCTGCACCGGCGGCCATCGAGC-----------CTGCTCCTTGACTAAC 215 | |..| ||.|||| |||.|.|.||..| OSU76029 192 A--TTAA---------ATTGAGCAATGCATTTCGCTGATACATGTGT--- 227 [...] #---------------------------------------