Scientific Supercomputing at the NIH

AMOS: A Modular Open-Source Genome Assembler on Helix

AMOS is not an assembler, rather a software infrastructure for developing assembly tools. However, AMOS indeed provides two assemblers: AMOScmp - a comparative assembler; and Minimus - a basic assembler for small datasets. It is important to realize that, with a little bit of programming, users can use AMOS to put together their own shotgun assembler customized for the specific characteristics of their data.

AMOS is modular in nature so that new contributions can be easily inserted into an existing assembly pipeline. This modular design will foster the development of new assembly algorithms and allow the AMOS project to continually grow and improve in hopes of eventually becoming a widely accepted and deployed assembly infrastructure. In this sense, AMOS is both a design philosophy and a software system.

AMOS development is a collaboration between the University of Maryland, The Institute for Genomic Research, the Karolinska Institute, and Woods Hole. For detailed information, see http://amos.sourceforge.net/

Programs Location on Helix

There are more than 200 programs already in AMOS package. They are located under :

/usr/local/amos/amos-2.0.8/bin

Sample Session

If you use the program frequently, it may be convenient to add this directory to your path. For a single session:

% setenv PATH /usr/local/amos/amos-2.0.8/bin:$PATH (csh or tcsh)

$ PATH=/usr/local/amos/amos-2.0.8/bin:$PATH; export PATH (bash)

To add this directory to your path at login time, add the appropriate line above to your ~/.cshrc or ~/.bashrc file.

The following example shows how to use 'minimal' program in Amos package. First is to copy files from /usr/local/amos/amos-2.0.8/test/minimus/influenza-A/ into user's data directory (replace 'user' with your own user ID):

% cp -rp /usr/local/amos/amos-2.0.8/test/minimus/influenza-A/ /data/user/amos/

In the influenza-A directory, there is a set of Trace Archive data with the names `influenza-A.seq' and `influenza-A.qual' which contain the sequence information for a small assembly task. To run the minimus pipeline and generate the default output, type the following:

% tarchive2amos -o influenza-A.seq
Collecting file information
Processing the files
doing fragments
putting it together
done
% minimus -D TGT=influenza-A.afg influenza-A
The log file is: influenza-A.runAmos.log
Doing step 10: Building AMOS bank
Doing step 20: Running overlap
Doing step 30: Running contigger
Doing step 40: Running consensus
Doing step 50: Outputting contigs
Doing step 60: Converting to FastA file

This will generate the default output named `influenza-A.contig' and `influenza-A.fasta'. We could then generate an ACE assembly format file by:

% bank-report -b influenza-A.bnk/ CTG > influenza-A.ctg
START DATE: Wed Dec 10 14:20:00 2008
Bank is: influenza-A.bnk/
0% 100%
CTG ..................................................
Objects reported: 8
END DATE: Wed Dec 10 14:20:00 2008
% amos2ace influenza-A.afg influenza-A.ctg

The output file is influenza-A.ace

Documentation

http://amos.sourceforge.net/