JAligner

is an open source Java implementation of the Smith-Waterman algorithm with Gotoh's improvement for biological local pairwise sequence alignment using the affine gap penalty model.

By: Ahmed Moustafa (ahmed@users.sf.net)

Features

  • The space complexity to perform the dynamic programming with the main similarity scores matrix and the 2 auxiliary gaps matrices is reduced from O(m×n) to O(n), where m and n are the sizes of the vertical sequence and horizontal sequence respectively, by using sufficient single-dimensional arrays of size n instead of the original two-dimensional arrays of size m×n.

  • The two-dimensional array of size m×n, for holding the traceback directions (diagonal, left, up and stop), is mapped into a single-dimensional array of size m×n. This approach speeds up the process of memory allocation because the Java Virtual Machine (JVM) attempts to allocate a single-dimensional array of m×n "bytes" (primitive data type), instead of attempting to allocate an array of m "objects", each of which is an "array" of n bytes.

  • In addition to the 70 already included scoring matrices, which have been picked up from the NCBI site, JAligner works with user-defined scoring matrices.

  • It is easy to use JAligner through a friendly Graphical User Interface (GUI), simple command line syntax or reusable Programming Application Interface (API).

Usage

There are several ways to align a pair of sequences using JAligner:

  • Command line

    java -jar jaligner.jar <s1> <s2> <matrix> <open> <extend>

    where:

    • s1: path to a file containing input sequence #1.
    • s2: path to a file containing input sequence #2.
    • matrix: name of a scoring matrix, or path to a file containing a user-defined scoring matrix.
    • open: open gap penalty.
    • extend: extend gap penalty.

    Example:

    java -jar jaligner.jar s1.fa s2.fa BLOSUM62 10.0 0.5

    In order to load a user-defined scoring matrix from the file system, the path to the matrix file has to include at least one file separator (a file separator flags JAligner to load the scoring matrix from the file system instead of looking it up in jaligner.jar).

    Example:

    java -jar jaligner.jar s1.fa s2.fa ./matrix.txt 10.0 0.5

    The layout of a user-defined scoring matrix file is expected to be the same as the layout of the standard scoring matrices:

    • optional comment lines (a comment line starts with a number sign "#"),
    • header line with the letters in the alphabet of the two sequences, and
    • a line for each letter in the alphabet where each line starts with that letter followed by the substitution scores for the corresponding letters in the header line.
  • Java Network Launch Protocol (JNLP)

    In general, JNLP-based applications require Java Web Start (JWS) to be installed on the client machine, fortunately, JWS has been bundled within the core Standard Java Edition (J2SE) since J2SE 1.4.

    So assuming JWS is already installed, JAligner can be launched by visiting the XML deployment descriptor jaligner.jnlp at <http://jaligner.sourceforge.net/jaligner.jnlp> through the web browser or command line with the executable javaws, which exists under the javaws directory under the installation (root) directory of the Java Runtime Environment (JRE).

    Example:

    javaws http://jaligner.sourceforge.net/jaligner.jnlp

    In jaligner.jnlp, a full permission is requested because the application needs access to:

    • the system clipboard for editing (cut and paste) the input sequences,
    • the file system for loading and storing the input sequences and output alignments, and
    • the JVM properties: user.home, file.separator and line.separator.

    But since jaligner.jar is signed by a self-signed certificate, once the download of the JAR file is complete, JWS displays a message warning that the application is requesting a full permission and the signing certificate could not be verified, so to bypass that warning message and to start the application, it is required to click on the "Start" button in the warning message window.

  • Desktop

    The command line to start JAligner as a desktop GUI application is

    java -jar jaligner.jar

    In addition, there are downloadable installers (built using ej-technologies's install4j) for the following operating systems (Linux, UNIX, Mac OS X and Windows).

  • Programming Application Interface (API)

    Class SmithWatermanGotoh has the public static method align, that can be called programmatically to align two sequences.

Notes

  • The JVM uses by default a memory allocation pool of an initial size 2MB and a maximum size 64MB. Large sequences will raise the out of memory error, when the memory requirement exceeds the available space, so for such cases, it will be necessary to initialize the JVM with the proper heap size using the -Xms (the initial size) and -Xmx (the maximum size) options.

    Example:

    java -Xms128m -Xmx512m -jar jaligner.jar
  • Compiling the source code needs an implementation of the specifications of the Java Network Launch Protocol (JNLP) to be in the compilation classpath and including Java Web Start's javaws.jar provides the required implementation.

Licenses

If you are using JAligner in a published work or product, please cite:

Ahmed Moustafa, JAligner: Open source Java implementation of Smith-Waterman, http://jaligner.sourceforge.net (the date accessed).

References

Acknowledgments

I deeply appreciate all people who have contributed with questions, comments or suggestions regarding JAligner, every single feedback has been helpful and I have learned from it. I would like to express my special thanks to:

  • ej-technologies: providing free license for install4j (May 2005).
  • Bram Minnaert: detecting a bug in the initialization of the auxiliary matrices (October 2004), and for fixing the traceback logic and providing testing modules for testing the produced alignments against the alignment scores (March 2005)
  • Hector Gonzalez: detecting a bug in the initialization of the traceback matrix (March 2004),
  • Andreas Doms: detecting a bug in the traceback stopping condition and suggesting a fix that improved the performance as well (February 2004),
  • Ryan Golhar: recommending changing the traceback from recursion to iteration to avoid a stack overflow problem (August 2003), and
  • Tim Carver: feedbacks on the GUI layout and alignment format (July 2003).
Shortcuts
News

Last modified: $Date: 2007/06/02 14:58:58 $

Hosted by SourceForge.Net