Scientific Software Guide

Version:
  • Amber16-mpi,
  • Amber16-gpu-mpi,
  • Amber18-gpu-mpi
Description:

"Amber" refers to two things: a set of molecular mechanical force fields for the simulation of biomolecules (which are in the public domain, and are used in a variety of simulation programs); and a package of molecular simulation programs which includes source code and demos

Compilation Notes:

Compiled using Intel compiler and its MKL math libraries (Parallel Studio 2017). Serial, parallel (INTELMPI), and cuda (8.0) versions are available. AmberTools is distributed in source code format, and must be compiled in order to be used. You will need C, C++ and Fortran90 compilers.

Load the Module:
module load amber/16-cuda-mpi module load amber/16-gen module load amber/18-gen
Research Area:
Biology, Chemistry, Material Science, Physics
Additional License Details:

UNT (site) proprietary

Please contact hpc-admin@unt.edu for access to Amber

Citation Information:

When citing Amber14 or AmberTools15 please use the following: D.A. Case, J.T. Berryman, R.M. Betz, D.S. Cerutti, T.E. Cheatham, III, T.A. Darden, R.E. Duke, T.J. Giese, H. Gohlke, A.W. Goetz, N. Homeyer, S. Izadi, P. Janowski, J. Kaus, A. Kovalenko, T.S. Lee, S. LeGrand, P. Li, T. Luchko, R. Luo, B. Madej, K.M. Merz, G. Monard, P. Needham, H. Nguyen, H.T. Nguyen, I. Omelyan, A. Onufriev, D.R. Roe, A. Roitberg, R. Salomon-Ferrer, C.L. Simmerling, W. Smith, J. Swails, R.C. Walker, J. Wang, R.M. Wolf, X. Wu, D.M. York and P.A. Kollman (2015), AMBER 2015, University of California, San Francisco.

Version:
  • 5.2
Description:

Anaconda is a distribution of Python and R languages for data science and machine learning. 

Version:
  • 7.950.1
Description:
  • Armadillo is a high quality linear algebra library (matrix maths) for the C++ language, aiming towards a good balance between speed and ease of use 
  • Provides high-level syntax (API) deliberately similar to Matlab 
  • Useful for algorithm development directly in C++, or quick conversion of research code into production environments (eg. software & hardware products) 
  • Can be used for machine learning, pattern recognition, computer vision, signal processing, bioinformatics, statistics, finance, etc 
  • Provides efficient classes for vectors, matrices and cubes (1st, 2nd and 3rd order tensors), as well as 200+ associated functions; integer, floating point and complex numbers are supported 
  • Various matrix decompositions are provided through integration with LAPACK, or one of its high performance drop-in replacements (eg. multi-threaded Intel MKL, orOpenBLAS
  • A sophisticated expression evaluator (based on template meta-programming) automatically combines several operations to increase speed and efficiency 
  • Can automatically use OpenMP multi-threading (parallelisation) to speed up computationally expensive operations
Research Area:
Biology, Computer Science, Health Science, Mathematics, Statistics, Linux Library
Version:
  • 2.4.1
Description:

BamTools is a project that provides both a C++ API and a command-line toolkit for reading, writing, and manipulating BAM (genome alignment) files.

Research Area:
Biology, Linux Utility
License type:
GNU General public license (open source)
Version:
  • 0.9.0,
  • 0.14.0
Description:

Tool for automation of building and testing software. 

Version:
  • 1.4.1,
  • 1.4
Description:

BCFtools is a set of utilities that manipulate variant calls in the Variant Call Format (VCF) and its binary counterpart BCF. All commands work transparently with both VCFs and BCFs, both uncompressed and BGZF-compressed.

Research Area:
Linux Utility
Version:
  • 2.6.0,
  • 1.8.4
Description:

 

BEAST1

BEAST is a cross-platform program for Bayesian analysis of molecular sequences using MCMC. It is entirely orientated towards rooted, time-measured phylogenies inferred using strict or relaxed molecular clock models. It can be used as a method of reconstructing phylogenies but is also a framework for testing evolutionary hypotheses without conditioning on a single tree topology. BEAST uses MCMC to average over tree space, so that each tree is weighted proportional to its posterior probability. We include a simple to use user-interface program for setting up standard analyses and a suit of programs for analysing the results.

BEAST 2

BEAST 2 is a cross-platform program for Bayesian phylogenetic analysis of molecular sequences. It estimates rooted, time-measured phylogenies using strict or relaxed molecular clock models. It can be used as a method of reconstructing phylogenies but is also a framework for testing evolutionary hypotheses without conditioning on a single tree topology. BEAST 2 uses Markov chain Monte Carlo (MCMC) to average over tree space, so that each tree is weighted proportional to its posterior probability. BEAST 2 includes a graphical user-interface for setting up standard analyses and a suit of programs for analysing the results.

Research Area:
Biology
Version:
  • 2.6.0
Description:

The Basic Local Alignment Search Tool, BLAST, finds regions of local similarity between sequences. The program compares nucleotide or protein sequences to sequence databases and calculates the statistical significance of matches. BLAST can be used to infer functional and evolutionary relationships between sequences as well as help identify members of gene families.

Sequence similarity searching is one of the more important bioinformatics activities and often provides the first evidence for the function of a newly sequenced gene or piece of sequence. BLAST is probably the most popular similarity search tool. The National Center for Biotechnology Information first introduced BLAST in 1989. The NCBI has continued to maintain and update BLAST since the first version. In 2009, the NCBI introduced a new version of the stand-alone BLAST applications, BLAST+.

The BLAST+ applications have a number of improvements that allow faster searches as well as more flexibility in output formats and in the search input. These improvements include: splitting of longer queries so as to reduce the memory usage and to take advantage of modern CPU architectures; use of a database index to dramatically speed up the search; the ability to save a “search strategy” that can be used later to start a new search; and greater flexibility in the formatting of tabular results.

Research Area:
Biology
Version:
  • 1.63.0
Description:

Boost provides free peer-reviewed portable C++ source libraries.

We emphasize libraries that work well with the C++ Standard Library. Boost libraries are intended to be widely useful, and usable across a broad spectrum of applications. The Boost license encourages both commercial and non-commercial use.

We aim to establish "existing practice" and provide reference implementations so that Boost libraries are suitable for eventual standardization. Ten Boost libraries are included in the C++ Standards Committee's Library Technical Report (TR1) and in the new C++11 Standard. C++11 also includes several more Boost libraries in addition to those from TR1. More Boost libraries are proposed for standardization in C++17.

Compilation Notes:

Compiled using gcc.

Research Area:
Biology, Linux Library
License type:
GNU General public license (open source)
Version:
  • 2.3.2 (default) and
  • 2.3.1
Description:

Bowtie 2 is an ultrafast and memory-efficient tool for aligning sequencing reads to long reference sequences. It is particularly good at aligning reads of about 50 up to 100s or 1,000s of characters to relatively long, e.g. mammalian, genomes. Bowtie 2 indexes the genome with an FM Index, based on the Burrows-Wheeler Transform or BWT, to keep its memory footprint small: for the human genome, its memory footprint is typically around 3.2 gigabytes of RAM. Bowtie 2 supports gapped, local, and paired-end alignment modes. Multiple processors can be used simultaneously to achieve greater alignment speed. Bowtie 2 outputs alignments in SAM format, enabling interoperation with a large number of other tools, e.g., SAMtools, GATK, that use SAM. Bowtie 2 is distributed under the GPLv3 license, and it runs on the command line under Windows, Mac OS X and Linux.

Bowtie 2 is often the first step in pipelines for comparative genomics, including for variation calling, ChIP-seq, RNA-seq, BS-seq. Bowtie 2 and Bowtie, also called "Bowtie 1" here, are also tightly integrated into some tools, including 

  • TopHat: a fast splice junction mapper for RNA-seq reads, 
  • Cufflinks: a tool for transcriptome assembly and isoform quantitiation from RNA-seq reads, 
  • Crossbow: a cloud-enabled software tool for analyzing reseuqncing data, and 
  • Myrna: a cloud-enabled software tool for aligning RNA-seq reads and measuring differential gene expression.
Compilation Notes:

Binary installation.

Bowtie is a prerequisite for tophat.

Load the Module:
module load bowtie2
Research Area:
Biology
License type:
GNU General public license (open source)
Citation Information:

Bowtie 2 requests that you cite this paper for published research.

BWA
Version:
  • 0.7.17
Description:

BWA is a software package for mapping low-divergent sequences against a large reference genome, such as the human genome. It consists of three algorithms: BWA-backtrack, BWA-SW and BWA-MEM. The first algorithm is designed for Illumina sequence reads up to 100bp, while the rest two for longer sequences ranged from 70bp to 1Mbp. BWA-MEM and BWA-SW share similar features such as long-read support and split alignment, but BWA-MEM, which is the latest, is generally recommended for high-quality queries as it is faster and more accurate. BWA-MEM also has better performance than BWA-backtrack for 70-100bp Illumina reads.

Research Area:
Biology
Version:
  • 1.0.6
Description:

Bzip2 is a freely available, patent free (see below), high-quality data compressor. It typically compresses files to within 10 to 15 percent of the best available techniques (the PPM family of statistical compressors), while being around twice as fast at compression and six times faster at decompression.

Because it compresses well, it packs more into your overfull disk drives, distribution CDs, backup tapes, USB sticks, etc. And/or it reduces your customer download times, long distance network traffic, etc. It's not the world's fastest compressor, but it's still fast enough to be very useful.

Because it's open-source (BSD-style license), and, as far as I know, patent-free. (To the best of my knowledge. I can't afford to do a full patent search, so I can't guarantee this. Caveat emptor). So you can use it for whatever you like. Naturally, the source code is part of the distribution.

Because it supports (limited) recovery from media errors. If you are trying to restore compressed data from a backup tape or disk, and that data contains some errors, bzip2 may still be able to decompress those parts of the file which are undamaged.

Because you already know how to use it. bzip2's command line flags are similar to those of GNU Gzip, so if you know how to use gzip, you know how to use bzip2.

Because it's very portable. It should run on any 32 or 64-bit machine with an ANSI C compiler. The distribution should compile unmodified on Unix and Win32 systems. Earlier versions have been ported with little difficulty to a large number of weird and wonderful systems.

Research Area:
Biology
Version:
  • 1.14.10
Description:

Cairo is a 2-D graphics library with support for multiple output devices. Currently supported output targets include the X Window System (via both Xlib and XCB), Quartz, Win32, image buffers, PostScript, PDF, and SVG file output. Experimental backends include OpenGL, BeOS, OS/2, and DirectFB.

Cairo is designed to produce consistent output on all output media while taking advantage of display hardware acceleration when available (e.g., through the X Render Extension).

The Cairo API provides operations similar to the drawing operators of PostScript and PDF. Operations in cairo including stroking and filling cubic Bézier splines, transforming and compositing translucent images, and antialiased text rendering. All drawing operations can be transformed by any affine transformation (scale, rotation, shear, etc.)

Cairo is implemented as a library written in the C programming language, but bindings are available for several different programming languages.

Research Area:
Biology
License type:
GNU General public license (open source)
Version:
  • 4.6.6-2016-0711
Description:

CD-HIT is very fast and can handle extremely large databases. CD-HIT helps to significantly reduce the computational and manual efforts in many sequence analysis tasks and aids in understanding the data structure and correct the bias within a dataset.

Research Area:
Biology
Version:
  • 3.8.0
Description:

CMake is an open-source, cross-platform family of tools designed to build, test and package software. CMake is used to control the software compilation process using simple platform and compiler independent configuration files, and generate native makefiles and workspaces that can be used in the compiler environment of your choice. The suite of CMake tools were created by Kitware in response to the need for a powerful, cross-platform build environment for open-source projects such as ITK and VTK.

Compilation Notes:

Installed using gcc.

Manual Submission:

It is a Utility. Just load the module.

Research Area:
Linux Utility
Version:
  • 2.2.1
Description:
Cufflinks assembles transcripts, estimates their abundances, and tests for differential expression and regulation in RNA-Seq samples. It accepts aligned RNA-Seq reads and assembles the alignments into a parsimonious set of transcripts. Cufflinks then estimates the relative abundances of these transcripts based on how many reads support each one, taking into account biases in library preparation protocols.
Compilation Notes:

compiled using GCC suite

Research Area:
Biology
Version:
  • 7.54.1
Description:

Curl is used in command lines or scripts to transfer data. It is also used in cars, television sets, routers, printers, audio equipment, mobile phones, tablets, settop boxes, media players and is the internet transfer backbone for thousands of software applications affecting billions of humans daily.

Research Area:
Biology
License type:
GNU General public license (open source)
Version:
  • 2.2.16
Description:

dDocent’s purpose is to be a standalone laboratory protocol and analysis pipeline for double digest Restriction site Associated DNA (ddRAD) sequencing (the pipeline should also work with ezRAD).  The laboratory protocol largely follows Peterson et al. (2012), but is focused down to specifically what has worked best for us in the Gold lab.

Compilation Notes:

Self-retrieving and installing with gcc.

Research Area:
Biology
Version:
  • 4.08,
  • class-1.10
Description:

DL_POLY is a general purpose classical molecular dynamics simulation software developed at Daresbury Laboratory by I.T. Todorov and W. Smith.

Research Area:
Chemistry
Version:
  • 2017
Description:

Command-line tools for processing biological sequencing data. Barcode demultiplexing, adapter trimming, etc. Primarily written to support an Illumina based pipeline, but should work with any FASTQs.

  • fastq-mcf - Scans a sequence file for adapters, and, based on a log-scaled threshold, determines a set of clipping parameters and performs clipping. Also does skewing detection and quality filtering.
  • fastq-multx - Demultiplexes a fastq. Capable of auto-determining barcode id's based on a master set fields. Keeps multiple reads in-sync during demultiplexing. Can verify that the reads are in-sync as well, and fail if they're not.
  • fastq-join - Similar to audy's stitch program, but in C, more efficient and supports some automatic benchmarking and tuning. It uses the same "squared distance for anchored alignment" as other tools.
  • varcall - Takes a pileup and calculates variants in a more easily parameterized manner than some other tools.
Research Area:
Biology
Version:
  • 3.3.3
Description:
  • Eigen is versatile.
    • It supports all matrix sizes, from small fixed-size matrices to arbitrarily large dense matrices, and even sparse matrices.
    • It supports all standard numeric types, including std::complex, integers, and is easily extensible to custom numeric types.
    • It supports various matrix decompositions and geometry features.
    • Its ecosystem of unsupported modules provides many specialized features such as non-linear optimization, matrix functions, a polynomial solver, FFT, and much more.
  • Eigen is fast.
    • Expression templates allow to intelligently remove temporaries and enable lazy evaluation, when that is appropriate.
    • Explicit vectorization is performed for SSE 2/3/4, AVX, FMA, AVX512, ARM NEON (32-bit and 64-bit), PowerPC AltiVec/VSX (32-bit and 64-bit) instruction sets, and now S390x SIMD (ZVector) with graceful fallback to non-vectorized code.
    • Fixed-size matrices are fully optimized: dynamic memory allocation is avoided, and the loops are unrolled when that makes sense.
    • For large matrices, special attention is paid to cache-friendliness.
  • Eigen is reliable.
    • Algorithms are carefully selected for reliability. Reliability trade-offs are clearly documented and extremely safe decompositions are available.
    • Eigen is thoroughly tested through its own test suite (over 500 executables), the standard BLAS test suite, and parts of the LAPACK test suite.
  • Eigen is elegant.
    • The API is extremely clean and expressive while feeling natural to C++ programmers, thanks to expression templates.
    • Implementing an algorithm on top of Eigen feels like just copying pseudocode.
  • Eigen has good compiler support as we run our test suite against many compilers to guarantee reliability and work around any compiler bugs. Eigen also is standard C++98 and maintains very reasonable compilation times.
Research Area:
Biology
Version:
  • 6.0
Description:

Quantum Espresso is an integrated suite of open-source computer codes for electronic-structure calculations and materials modeling at the nanoscale. It is based on density-functional theory, plane waves and pseudopotentials.

Research Area:
Material Science
Version:
  • 0.0.13
Description:

The FASTX-Toolkit is a collection of command line tools for Short-Reads FASTA/FASTQ files preprocessing.

Next-Generation sequencing machines usually produce FASTA or FASTQ files, containing multiple short-reads sequences (possibly with quality information).

The main processing of such FASTA/FASTQ files is mapping (aka aligning) the sequences to reference genomes or other databases using specialized programs. Example of such mapping programs are: BlatSHRiMPLastZMAQ and many many others.

However, it is sometimes more productive to preprocess the FASTA/FASTQ files before mapping the sequences to the genome - manipulating the sequences to produce better mapping results.

The FASTX-Toolkit tools perform some of these preprocessing tasks.

Research Area:
Biology
Version:
  • 3.3.6 (default)
  • 2.1.5
Description:

FFTW is a C subroutine library for computing the discrete Fourier transform (DFT) in one or more dimensions, of arbitrary input size, and of both real and complex data (as well as of even/odd data, i.e. the discrete cosine/sine transforms or DCT/DST). We believe that FFTW, which is free software, should become the FFT library of choice for most applications.

Compilation Notes:

Intel 13, icc/ifort compiled with openmpi/intel/mlx1.6.5 support. Intel 14, icc/ifort compiled with openmpi/intel/14/mlx/1.6.5 support. (this is the recommended version).

Load the Module:
module load fftw
Research Area:
Linux Library
Version:
  • Oct. 30, 2017,
  • Aug. 18, 2016
Description:

The General Atomic and Molecular Electronic Structure System (GAMESS) is a general ab initio quantum chemistry package.

GAMESS is a program for ab initio molecular quantum chemistry. Briefly, GAMESS can compute SCF wavefunctions ranging from RHF, ROHF, UHF, GVB, and MCSCF. Correlation corrections to these SCF wavefunctions include Configuration Interaction, second order perturbation Theory, and Coupled-Cluster approaches, as well as the Density Functional Theory approximation. Excited states can be computed by CI, EOM, or TD-DFT procedures. Nuclear gradients are available, for automatic geometry optimization, transition state searches, or reaction path following. Computation of the energy hessian permits prediction of vibrational frequencies, with IR or Raman intensities. Solvent effects may be modeled by the discrete Effective Fragment potentials, or continuum models such as the Polarizable Continuum Model. Numerous relativistic computations are available, including infinite order two component scalar corrections, with various spin-orbit coupling options. The Fragment Molecular Orbital method permits use of many of these sophisticated treatments to be used on very large systems, by dividing the computation into small fragments. Nuclear wavefunctions can also be computed, in VSCF, or with explicit treatment of nuclear orbitals by the NEO code.

A variety of molecular properties, ranging from simple dipole moments to frequency dependent hyperpolarizabilities may be computed. Many basis sets are stored internally, together with effective core potentials or model core potentials, so that essentially the entire periodic table can be considered.

Most computations can be performed using direct techniques, or in parallel on appropriate hardware. Graphics programs, particularly the MacMolplt program (for Macintosh, Windows, or Linux desktops), are available for viewing of the final results, and the Avogadro program can assist with preparation of inputs.

Research Area:
Chemistry
Version:
  • 3.80
Description:

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Research Area:
Biology
Version:
  • g16-RevA.03-ax2,
  • g09-RevD,
  • g09-RevA
Description:

Summary

Gaussian 09 is the latest version of the Gaussian® series of electronic structure programs, used by chemists, chemical engineers, biochemists, physicists and other scientists worldwide. Starting from the fundamental laws of quantum mechanics, Gaussian 09 predicts the energies, molecular structures, vibrational frequencies and molecular properties of molecules and reactions in a wide variety of chemical environments. Gaussian 09’s models can be applied to both stable species and compounds which are difficult or impossible to observe experimentally (e.g., short-lived intermediates and transition structures).

Gaussian 09 provides the most advanced modeling capabilities available today, and it includes many new features and enhancements which significantly expand the range of problems and systems which can be studied. With Gaussian 09, you can model larger systems and more complex problems than ever before, even on modest computer hardware.

Description

Gaussian 09 is the latest version of the Gaussian® series of electronic structure programs, used by chemists, chemical engineers, biochemists, physicists and other scientists worldwide. Starting from the fundamental laws of quantum mechanics, Gaussian 09 predicts the energies, molecular structures, vibrational frequencies and molecular properties of molecules and reactions in a wide variety of chemical environments. Gaussian 09’s models can be applied to both stable species and compounds which are difficult or impossible to observe experimentally (e.g., short-lived intermediates and transition structures).

Gaussian 09 provides the most advanced modeling capabilities available today, and it includes many new features and enhancements which significantly expand the range of problems and systems which can be studied. With Gaussian 09, you can model larger systems and more complex problems than ever before, even on modest computer hardware.

Compiled by:
Charles Peterson
Compilation Notes:

Gaussian runs on shared memory and run single host.

Load the Module:
module load gaussian/g09-revA
Research Area:
Chemistry
Additional License Details:

Gaussian is has been a license owned by UNT and UNT Chemistry. Please email hpc-admin@unt.edu for questions about getting access to Gaussian.

Citation Information:

Citation Information can be found here: http://www.gaussian.com/g_tech/g_ur/m_citation.htm

Version:
  • 2.25
Description:

The GNU C Library project providesthecore libraries for the GNU system and GNU/Linux systems, as well as many other systems that use Linux as the kernel. These libraries provide critical APIs including ISO C11, POSIX.1-2008, BSD, OS-specific APIs and more. These APIs include such foundational facilities as open, read, write, malloc, printf, getaddrinfo, dlopen, pthread_create, crypt, login, exit and more.

The GNU C Library is designed to be a backward compatible, portable, and high-performance ISO C library. It aims to follow all relevant standards including ISO C11, POSIX.1-2008, and IEEE 754-2008.

Research Area:
Biology
License type:
GNU General public license (open source)
Version:
  • 5.0.6
Description:

Gnuplot is a portable command-line driven graphing utility for Linux, OS/2, MS Windows, OSX, VMS and many other platforms. The source code is copyrighted but freely distributed (i.e., you don't have to pay for it). It was originally created to allow scientists and students to visualize mathematical functions and data interactively, but has grown to support many non-interactive uses such as web scripting. It is also used as a plotting engine by third-party applications like Octave. Gnuplot has been supported and under active development since 1986.

Research Area:
Linux Library
Go
Version:
  • 1.10.3
Description:

Go is a programming language by google.

Version:
  • 5.1.4-fftw
Description:

GROMACS is a versatile package to perform molecular dynamics, i.e. simulate the Newtonian equations of motion for systems with hundreds to millions of particles.

Compilation Notes:

Makefile was made with cmake (2.8) and using gfortran.

Research Area:
Biology, Chemistry
GSL
Version:
  • 2.4
Description:

The GNU Scientific Library is a numerical library for C and C++ programmers. It is free software under the GNU General Public License.

The library provides a wide range of mathematical routines such as random number generators, special functions and least-squares fitting. There are more than 1,000 functions in total with an extensive test suite.

Research Area:
Biology
License type:
GNU General public license (open source)
Version:
  • 4.5
Description:
  • Automation - gulp is a toolkit that helps you automate painful or time-consuming tasks in your development workflow.
  • Platform-agnostic - Integrations are built into all major IDEs and people are using gulp with PHP, .NET, Node.js, Java, and other platforms.
  • Strong Ecosystem - Use npm modules to do anything you want + over 2000 curated plugins for streaming file transformations
  • Simple - By providing only a minimal API surface, gulp is easy to learn and simple to use
Research Area:
Biology
Version:
  • 2.8.0
Version:
  • 1.8.18-gnu |
  • 1.8.18-intel
Description:

HDF5 is a data model, library, and file format for storing and managing data. It supports an unlimited variety of datatypes, and is designed for flexible and efficient I/O and for high volume and complex data. HDF5 is portable and is extensible, allowing applications to evolve in their use of HDF5. The HDF5 Technology suite includes tools and applications for managing, manipulating, viewing, and analyzing data in the HDF5 format.

Research Area:
Linux Library
Version:
  • 2.0.5 (default) |
  • 0.1.6
Description:

HISAT is a fast and sensitive spliced alignment program for mapping RNA-seq reads. In addition to one global FM index that represents a whole genome, HISAT uses a large set of small FM indexes that collectively cover the whole genome (each index represents a genomic region of ~64,000 bp and ~48,000 indexes are needed to cover the human genome). These small indexes (called local indexes) combined with several alignment strategies enable effective alignment of RNA-seq reads, in particular, reads spanning multiple exons. The memory footprint of HISAT is relatively low (~4.3GB for the human genome). We have developed HISAT based on the Bowtie2 implementation to handle most of the operations on the FM index. 

Research Area:
Biology
Version:
  • 1.6 (default) |
  • 1.4.1 |
  • 1.4
Description:

HTSlib is an implementation of a unified C library for accessing common file formats, such as SAM, CRAM and VCF, used for high-throughput sequencing data, and is the core library used by samtools and bcftools. HTSlib only depends on zlib. It is known to be compatible with gcc, g++ and clang.

HTSlib implements a generalized BAM index, with file extension .csi (coordinate-sorted index). The HTSlib file reader first looks for the new index and then for the old if the new index is absent.

Research Area:
Biology
ICU
Version:
  • 4c-59
Description:

ICU is a mature, widely used set of C/C++ and Java libraries providing Unicode and Globalization support for software applications. ICU is widely portable and gives applications the same results on all platforms and between C/C++ and Java software.

ICU is released under a nonrestrictive open source license that is suitable for use with both commercial software and with other open source or free software.

Research Area:
Biology
Version:
  • intel-tbb-oss/ia32/2017_20161128oss
  • intel-ode/1.0.0
  • PS2017-t2
  • PS2017-17.0.4-legacy
  • PS2017-17.0.4-compute
  • PS2017-17.0.4
  • PS2017
  • mkl/PS2017
  • compilers/17.0
Description:

Intel Fortran Compiler, also known as IFORT, is a group of Fortran compilers from Intel.

Research Area:
Linux Utility, Linux Library
License type:
Other type of license
Additional License Details:

UNT License

Version:
  • 4.2.0
Description:

JAGS is Just Another Gibbs Sampler.  It is a program for analysis of Bayesian hierarchical models using Markov Chain Monte Carlo (MCMC) simulation  not wholly unlike BUGS.  JAGS was written with three aims in mind:

  • To have a cross-platform engine for the BUGS language
  • To be extensible, allowing users to write their own functions, distributions and samplers.
  • To be a plaftorm for experimentation with ideas in Bayesian modelling

JAGS is licensed under the GNU General Public License.

Compiled by:
Scott Yockel
Compilation Notes:

Also availble within R package rjags by loading the library as follows:

> library(rjags)

Then you should see:
Loading required package: coda
Loading required package: lattice
Linked to JAGS 3.4.0
Loaded modules: basemod,bugs

Load the Module:
module load jags
Research Area:
Education, Statistics
Version:
  • 1.8
Description:

Java is at the heart of our digital lifestyle. It's the platform for launching careers, exploring human-to-digital interfaces, architecting the world's best applications, and unlocking innovation everywhere—from garages to global organizations.

Research Area:
Computer Science
Version:
  • 0.43.1
Description:

Kallisto is a program for quantifying abundances of transcripts from RNA-Seq data, or more generally of target sequences using high-throughput sequencing reads. It is based on the novel idea of pseudoalignment for rapidly determining the compatibility of reads with targets, without the need for alignment. On benchmarks with standard RNA-Seq data, kallisto can quantify 30 million human reads in less than 3 minutes on a Mac desktop computer using only the read sequences and a transcriptome index that itself takes less than 10 minutes to build. Pseudoalignment of reads preserves the key information needed for quantification, and kallisto is therefore not only fast, but also as accurate as existing quantification tools. In fact, because the pseudoalignment procedure is robust to errors in the reads, in many benchmarks kallisto significantly outperforms existing tools.

Research Area:
Biology
Version:
  • 2.0.3
Description:

Keras is a high-level neural networks API, written in Python and capable of running on top of TensorFlow, CNTK, or Theano. It was developed with a focus on enabling fast experimentation. Being able to go from idea to result with the least possible delay is key to doing good research.

Use Keras if you need a deep learning library that:

  • Allows for easy and fast prototyping (through user friendliness, modularity, and extensibility).
  • Supports both convolutional networks and recurrent networks, as well as combinations of the two.
  • Runs seamlessly on CPU and GPU.
Research Area:
Computer Science
Version:
  • 0.10
Description:

Kraken is a system for assigning taxonomic labels to short DNA sequences, usually obtained through metagenomic studies. Previous attempts by other bioinformatics software to accomplish this task have often used sequence alignment or machine learning techniques that were quite slow, leading to the development of less sensitive but much faster abundance estimation programs. Kraken aims to achieve high sensitivity and high speed by utilizing exact alignments of k-mers and a novel classification algorithm.

In its fastest mode of operation, for a simulated metagenome of 100 bp reads, Kraken processed over 4 million reads per minute on a single core, over 900 times faster than Megablast and over 11 times faster than the abundance estimation program MetaPhlAn. Kraken's accuracy is comparable with Megablast, with slightly lower sensitivity and very high precision.

Kraken is written in C++ and Perl, and is designed for use with the Linux operating system. We have also successfully compiled and run it under the Mac OS.

Research Area:
Biology
Version:
  • August 11 2017,
  • November 17 2016
Description:

LAMMPS is a classical molecular dynamics code, and an acronym for Large-scale Atomic/Molecular Massively Parallel Simulator.

Compiled by:
Shivraj Karewar
Research Area:
Chemistry, Material Science
Citation Information:

Citation Information can be found here: http://lammps.sandia.gov/cite.html

Version:
  • 3.7.0
Description:

LAPACK is written in Fortran 90 and provides routines for solving systems of simultaneous linear equations, least-squares solutions of linear systems of equations, eigenvalue problems, and singular value problems. The associated matrix factorizations (LU, Cholesky, QR, SVD, Schur, generalized Schur) are also provided, as are related computations such as reordering of the Schur factorizations and estimating condition numbers. Dense and banded matrices are handled, but not general sparse matrices. In all areas, similar functionality is provided for real and complex matrices, in both single and double precision.

Research Area:
Biology
Version:
  • 2.4.6
Description:

GNU libtool is a generic library support script. Libtool hides the complexity of using shared libraries behind a consistent, portable interface.

Research Area:
Biology
Version:
  • 10.0
Description:

Mathematica is a computational software program used in many scientific, engineering, mathematical and computing fields. It was conceived by Stephen Wolfram and is developed by Wolfram Research of Champaign, Illinois.

Load the Module:
module load mathematica/10.0
Automated Submission:
/cm/shared/talon3/run -p mathematica -q general -c 1 -n 1 -i test.m
Manual Submission:

#!/bin/bash

#SBATCH -j mathematica.job

#SBATCH -c 1 ##Run in serial

#SBATCH -q general

#SBATCH -C r420


hostname
date
prgname=math

## Setting up scratch ##

STORAGE_DIR="/storage/scratch2/$USER/${SLURM_JOB_ID}"
export  STORAGE_DIR

mkdir -pv $STORAGE_DIR

cd $STORAGE_DIR

###Copy input file (test.m) to scratch directory

cp $SLURM_SUBMIT_DIR/test.m $STORAGE_DIR
pwd

echo "The command $prgname is located at: "
which $prgname

### Run program where test.m is the input file and test.out is the created output file
time $prgname -nosplash -nodesktop -nodisplay <  test.m > test.out
env
##  These files will be copied back to your directory  ##
cp -a $STORAGE_DIR $SLURM_SUBMIT_DIR

echo -n ">> Job finished @ "
date
echo ">> Output can be found in: $SLURM_SUBMIT_DIR"
 

Research Area:
Mathematics, Statistics
Version:
  • R2014b,
  • R2016b,
  • R2018b
Description:

MATLAB® is a high-level language and interactive environment for numerical computation, visualization, and programming. Using MATLAB, you can analyze data, develop algorithms, and create models and applications. The language, tools, and built-in math functions enable you to explore multiple approaches and reach a solution faster than with spreadsheets or traditional programming languages, such as C/C++ or Java™.

Compiled by:
Charles Peterson
Compilation Notes:

The Linux distro of Matlab is installed. It includes distributed computing server that enables parallel execution of the code.

Load the Module:
module add matlab/R2016a
Research Area:
Mathematics, Statistics
Version:
  • September 15 2014
Description:

Genotype imputation is a key step in the analysis of genome-wide association studies. Upcoming very large reference panels, such as those from The 1000 Genomes Project and the Haplotype Consortium, will improve imputation quality of rare and less common variants, but will also increase the computational burden. Here, we demonstrate how the application of software engineering techniques can help to keep imputation broadly accessible. Overall, these improvements speed up imputation by an order of magnitude compared with our previous implementation.

Compiled by:
Garrett Crowe
Research Area:
Biology
Version:
  • 1.6.0
Description:

mpiBLAST is a freely available, open-source, parallel implementation of NCBI BLAST. By efficiently utilizing distributed computational resources through database fragmentation, query segmentation, intelligent scheduling, and parallel I/O, mpiBLAST improves NCBI BLAST performance by several orders of magnitude while scaling to hundreds of processors. mpiBLAST is also portable across many different platforms and operating systems. Lastly, a renewed focus and consolidation of the many codebases has positioned mpiBLAST to continue to be of high utility to the bioinformatics community. 

Research Area:
Biology
License type:
GNU General public license (open source)
Version:
  • 2.12-cuda
  • 2.12
Description:

NAMD, recipient of a 2002 Gordon Bell Award and a 2012 Sidney Fernbach Award, is a parallel molecular dynamics code designed for high-performance simulation of large biomolecular systems. Based on Charm++ parallel objects, NAMD scales to hundreds of cores for typical simulations and beyond 500,000 cores for the largest simulations. NAMD uses the popular molecular graphics program VMD for simulation setup and trajectory analysis, but is also file-compatible with AMBER, CHARMM, and X-PLOR. NAMD is distributed free of charge with source code. You can build NAMD yourself or download binaries for a wide variety of platforms. Our tutorials show you how to use NAMD and VMD for biomolecular modeling.

Research Area:
Biology
License type:
GNU General public license (open source)
Version:
  • 6.0
Description:

The ncurses (new curses) library is a free software emulation of curses in System V Release 4.0 (SVr4), and more. It uses terminfo format, supports pads and color and multiple highlights and forms characters and function-key mapping, and has all the other SVr4-curses enhancements over BSD curses. SVr4 curses became the basis of X/Open Curses.

Research Area:
Linux Utility
License type:
GNU General public license (open source)
Version:
  • 6.6-gcc
  • 6.6-3
  • 6.6-2
  • 6.6
Description:

Summary
NWChem is actively developed by a consortium of developers and maintained by the EMSL located at the Pacific Northwest National Laboratory (PNNL) in Washington State. The code is distributed as open-source under the terms of the Educational Community License version 2.0

Description
NWChem is actively developed by a consortium of developers and maintained by the EMSL located at the Pacific Northwest National Laboratory (PNNL) in Washington State. The code is distributed as open-source under the terms of the Educational Community License version 2.0

NWChem aims to provide its users with computational chemistry tools that are scalable both in their ability to treat large scientific computational chemistry problems efficiently, and in their use of available parallel computing resources from high-performance parallel supercomputers to conventional workstation clusters.

NWChem software can handle

  • Biomolecules, nanostructures, and solid-state
  • From quantum to classical, and all combinations
  • Ground and excited-states
  • Gaussian basis functions or plane-waves
  • Scaling from one to thousands of processors
  • Properties and relativistic effects
Compiled by:
Charles Peterson, John Pearson
Research Area:
Chemistry
Citation Information:

M. Valiev, E.J. Bylaska, N. Govind, K. Kowalski, T.P. Straatsma, H.J.J. van Dam, D. Wang, J. Nieplocha, E. Apra, T.L. Windus, W.A. de Jong, "NWChem: a comprehensive and scalable open-source solution for large scale molecular simulations" Comput. Phys. Commun. 181, 1477 (2010)

Version:
  • 0.2.8
Description:

Oases is a de novo transcriptome assembler designed to produce transcripts from short read sequencing technologies, such as Illumina, SOLiD, or 454 in the absence of any genomic assembly. It was developed by Marcel Schulz (MPI for Molecular Genomics) and Daniel Zerbino (previously at the European Bioinformatics Institute (EMBL-EBI), now at UC Santa Cruz).

Oases uploads a preliminary assembly produced by Velvet, and clusters the contigs into small groups, called loci. It then exploits the paired-end read and long read information, when available, to construct transcript isoforms.

Compiled by:
Garrett Crowe
Compilation Notes:

Requires the velvet module.

Research Area:
Biology
License type:
GNU General public license (open source)
Citation Information:

M.H. Schulz, D.R. Zerbino, M. Vingron and Ewan Birney. Oases: Robust de novo RNA-seq assembly across the dynamic range of expression levels. Bioinformatics, 2012. DOI: 10.1093/bioinformatics/bts094.

Version:
  • 0.2.19
Description:

OpenBLAS is an optimized BLAS library based on GotoBLAS2 1.13 BSD version.

Research Area:
Biology
Version:
  • 3.3
Description:

OpenCV (Open Source Computer Vision Library) is released under a BSD license and hence it’s free for both academic and commercial use. It has C++, C, Python and Java interfaces and supports Windows, Linux, Mac OS, iOS and Android. OpenCV was designed for computational efficiency and with a strong focus on real-time applications. Written in optimized C/C++, the library can take advantage of multi-core processing. Enabled with OpenCL, it can take advantage of the hardware acceleration of the underlying heterogeneous compute platform.

Adopted all around the world, OpenCV has more than 47 thousand people of user community and estimated number of downloads exceeding 14 million. Usage ranges from interactive art, to mines inspection, stitching maps on the web or through advanced robotics.

Research Area:
Computer Science
Version:
  • 5.0
Description:

Openfoam is a C++ toolbox for numerical solvers and solving continuum mechanics problems. 

Version:
  • gcc/2.1.0
  • intel/2.1.0
  • intel/2.1.0-cuda
  • gcc/1.10.6
  • pgi/1.10.2
Description:

The Open MPI Project is an open source Message Passing Interface implementation that is developed and maintained by a consortium of academic, research, and industry partners. Open MPI is therefore able to combine the expertise, technologies, and resources from all across the High Performance Computing community in order to build the best MPI library available. Open MPI offers advantages for system and software vendors, application developers and computer science researchers.

Compiled by:
Kayyuru Geyani
Research Area:
Linux Library
License type:
GNU General public license (open source)
Version:
  • 2.9.0
Description:

OVITO is a scientific visualization and analysis software for atomistic and particle simulation data. It helps scientists gain better insights into materials phenomena and physical processes.

OVITO is being developed by Alexander Stukowski at Darmstadt University of Technology, Germany. The program is Open Source and freely available for all major platforms. It has served in a growing number of computational simulation studies as a useful tool to analyze, understand, and illustrate simulation results.

Research Area:
Chemistry, Computer Science
Version:
  • 5.3.0
Description:

ParaView is an open-source, multi-platform data analysis and visualization application. ParaView users can quickly build visualizations to analyze their data using qualitative and quantitative techniques. The data exploration can be done interactively in 3D or programmatically using ParaView’s batch processing capabilities.

ParaView was developed to analyze extremely large datasets using distributed memory computing resources. It can be run on supercomputers to analyze datasets of petascale size as well as on laptops for smaller data, has become an integral tool in many national laboratories, universities and industry, and has won several awards related to high performance computation.

Research Area:
Linux Utility
License type:
GNU General public license (open source)
Version:
  • 5.24.1
Description:

Practical Extraction and Report Language

Compiled by:
Charlie Peterson
Research Area:
Linux Utility
License type:
GNU General public license (open source)
Version:
  • 2.10
Description:

Picard is a set of command line tools for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF. These file formats are defined in the Hts-specs repository. See especially the SAM specification and the VCF specification.

Research Area:
Computer Science
License type:
GNU General public license (open source)
Version:
  • 1.07
Description:

PLINK is a free, open-source whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner.

The focus of PLINK is purely on analysis of genotype/phenotype data, so there is no support for steps prior to this (e.g. study design and planning, generating genotype or CNV calls from raw data). Through integration with gPLINK and Haploview, there is some support for the subsequent visualization, annotation and storage of results.

PLINK (one syllable) is being developed by Shaun Purcell whilst at the Center for Human Genetic Research (CHGR), Massachusetts General Hospital (MGH), and the Broad Institute of Harvard & MIT, with the support of others.  

Research Area:
Biology
License type:
GNU General public license (open source)
Citation Information:

Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MAR, Bender D, Maller J, Sklar P, de Bakker PIW, Daly MJ & Sham PC (2007) PLINK: a toolset for whole-genome association and population-based linkage analysis. American Journal of Human Genetics, 81.

Version:
  • 1.0.0
Description:

Psi4 is an open-source suite of ab initio quantum chemistry programs designed for efficient, high-accuracy simulations of a variety of molecular properties. We can routinely perform computations with more than 2,500 basis functions running serially or in parallel.

Research Area:
Chemistry
Version:
  • 2.3.1
Description:

pssh is a program for executing ssh in parallel on a number of hosts. It provides features such as sending input to all of the processes, passing a password to ssh, saving output to files, and timing out.

The PSSH_NODENUM and PSSH_HOST environment variables are sent to the remote host. The PSSH_NODENUM variable is assigned a unique number for each ssh connection, starting with 0 and counting up. The PSSH_HOST variable is assigned the name of the host as specified in the hosts list.

Research Area:
Linux Utility
License type:
GNU General public license (open source)
Version:
  • 2.4
Description:

Welcome to the home of pyMPI, a project integrating the Message Passing Interface (MPI) into the Python interpreter. pyMPI is being developed primarily by researchers at Lawrence Livermore National Laboratory.

Research Area:
Biology
Version:
  • 3.6.0-2
  • 3.6.0 (default)
  • 2.7.14-pyCUDA
  • 2.7.13
Description:

Python is an interpreted, interactive, object-oriented programming language that combines remarkable power with very clear syntax. For an introduction to programming in Python you are referred to the Python Tutorial. The Python Library Reference documents built-in and standard types, constants, functions and modules.

Research Area:
Linux Utility
License type:
GNU General public license (open source)
R
Version:
  • 3.4.1
  • 3.3.2
  • R-devel
Description:

R is a language and environment for statistical computing and graphics. It is a GNU project which is similar to the S language and environment which was developed at Bell Laboratories (formerly AT&T, now Lucent Technologies) by John Chambers and colleagues. R can be considered as a different implementation of S. There are some important differences, but much code written for S runs unaltered under R.

R provides a wide variety of statistical (linear and nonlinear modelling, classical statistical tests, time-series analysis, classification, clustering, ...) and graphical techniques, and is highly extensible. The S language is often the vehicle of choice for research in statistical methodology, and R provides an Open Source route to participation in that activity.

FYI: For more information about R, visit the Data and Statistical Analysis Index of Articles in Research Matters.

Compiled by:
Scott Yockel
Research Area:
Education
Version:
  • 8.2.10
Research Area:
Biology
Version:
  • 7.0
Description:

The GNU Readline library provides a set of functions for use by applications that allow users to edit command lines as they are typed in. Both Emacs and vi editing modes are available. The Readline library includes additional functions to maintain a list of previously-entered command lines, to recall and perhaps reedit those lines, and perform csh-like history expansion on previous commands.

The history facilites are also placed into a separate library, the History library, as part of the build process. The History library may be used without Readline in applications which desire its capabilities.

Readline is free software, distributed under the terms of the GNU General Public License, version 3. This means that if you want to use Readline in a program that you release or distribute to anyone, the program must be free software and have a GPL-compatible license.

If you would like advice on making your license GPL-compatible, contact licensing@gnu.org.

Research Area:
Linux Utility
License type:
GNU General public license (open source)
Version:
  • 7.6
Description:

SageMath is a free open-source mathematics software system licensed under the GPL. It builds on top of many existing open-source packages: NumPy, SciPy, matplotlib,Sympy, Maxima, GAP, FLINT, R and many more. Access their combined power through a common, Python-based language or directly via interfaces or wrappers.

Mission: Creating a viable free open source alternative to Magma, Maple, Mathematica and Matlab.

Load the Module:
module load sagemath/7.6
Research Area:
Mathematics
License type:
GNU General public license (open source)
Version:
  • 1.6 (default)
  • 1.4.1
  • 1.4
Description:

SAM, Sequence Alignment Map, format is a generic format for storing large nucleotide sequence alignments.

Compilation Notes:

required './configure', 'make', and 'make install' commands.

Load the Module:
module add samtools
Research Area:
Biology
License type:
GNU General public license (open source)
Version:
  • 1.0
Description:

SeqPrep is a program to merge paired end Illumina reads that are overlapping into a single longer read. It may also just be used for its adapter trimming feature without doing any paired end overlap. When an adapter sequence is present, that means that the two reads must overlap (in most cases) so they are forcefully merged. When reads do not have adapter sequence they must be treated with care when doing the merging, so a much more specific approach is taken. The default parameters were chosen with specificity in mind, so that they could be ran on libraries where very few reads are expected to overlap. It is always safest though to save the overlapping procedure for libraries where you have some prior knowledge that a significant portion of the reads will have some overlap.

Research Area:
Biology
License type:
GNU General public license (open source)
Version:
  • 0.18.1
Description:
  • Simple and efficient tools for data mining and data analysis
  • Accessible to everybody, and reusable in various contexts
  • Built on NumPy, SciPy, and matplotlib
  • Open source, commercially usable - BSD license
Research Area:
Linux Utility
License type:
GNU General public license (open source)
Version:
  • 1.03
Description:

SOAPdenovo is a novel short-read assembly method that can build a de novo draft assembly for the human-sized genomes. The program is specially designed to assemble Illumina GA short reads. It creates new opportunities for building reference sequences and carrying out accurate analyses of unexplored genomes in a cost effective way. Now the new version is available.

Compilation Notes:

This code is NOT compiled with MPI, and should only be used in parallel on a SINGLE node, via a threaded model.

Research Area:
Biology
Version:
  • 3.10.1
Description:

SPADES is an assembly toolkit containing various assembly pipelines.

SPADES works with Illumina or IonTorrent reads, and is capable of providing hybrid assemblies using PacBio, Oxford Nanopore and Sanger reads. You can also provide additional contigs that will be used as long reads.

SPADES supports paired-end reads, mate-pairs and unpaired reads. It can take as input several paired-end and mate-pair libraries simultaneously. SPADES was initially designed for small genomes. It was tested on bacterial (both single-cell MDA and standard isolates), fungal and other small genomes. SPADES is not intended for larger genomes, such as mammalian size genomes. For such purposes, you use SPADES at your own risk.

Research Area:
Linux Utility
License type:
GNU General public license (open source)
Version:
  • 2.3.0
Description:

SPARK is a programming language for the development of high integrity software. 

Version:
  • 1.46
Description:

Stacks is a software pipeline for building loci from short-read sequences, such as those generated on the Illumina platform. Stacks was developed to work with restriction enzyme-based data, such as RAD-seq, for the purpose of building genetic maps and conducting population genomics and phylogeography.

Research Area:
Biology
Version:
  • 2.5.3a
Description:

The STAR program at Massachusetts Institute of Technology seeks to bridge the divide between scientific research and the classroom. Understanding and applying research methods in the classroom setting can be challenging due to time constraints and the need for advanced equipment and facilities. The multidisciplinary STAR team collaborates with faculty from MIT and other educational institutions to design software exploring core scientific research concepts. The goal of STAR is to develop innovative and intuitive teaching tools for classroom use.

Research Area:
Computer Science
Version:
  • 1.3.3b
Description:

StringTie is a fast and highly efficient assembler of RNA-Seq alignments into potential transcripts. It uses a novel network flow algorithm as well as an optional de novo assembly step to assemble and quantitate full-length transcripts representing multiple splice variants for each gene locus. Its input can include not only the alignments of raw reads used by other transcript assemblers, but also alignments longer sequences that have been assembled from those reads.In order to identify differentially expressed genes between experiments, StringTie's output can be processed by specialized software like Ballgown, Cuffdiff or other programs (DESeq2, edgeR, etc.).

Research Area:
Biology
License type:
GNU General public license (open source)
Version:
  • 1.12b
Description:

Szip compression software, providing lossless compression of scientific data, has been provided with HDF software products as of HDF5 Release 1.6.0 and HDF4 Release 2.0.

Szip is an implementation of the extended-Rice lossless compression algorithm. The Consultative Committee on Space Data Systems (CCSDS) has adopted the extended-Rice algorithm for international standards for space applications[1,6,7]. Szip is reported to provide fast and effective compression, specifically for the EOS data generated by the NASA Earth Observatory System (EOS)[1]. It was originally developed at University of New Mexico (UNM) and integrated with HDF4 by UNM researchers and developers.

Research Area:
Biology
Version:
  • 1.2.1-gpu
  • 1.2.1
Description:

TensorFlow™ is an open-source software library for numerical computation using data-flow graphs. Nodes in the graph represent mathematical operations, while the graph edges represent the multidimensional data arrays (tensors) communicated between them. The flexible architecture allows you to deploy computation to one or more CPUs or GPUs in a desktop, server, or mobile device with a single API. TensorFlow was originally developed by researchers and engineers working on the Google Brain Team within Google's Machine Intelligence research organization for the purposes of conducting machine learning and deep neural networks research, but the system is general enough to be applicable in a wide variety of other domains as well.

Research Area:
Computer Science
License type:
GNU General public license (open source)
Version:
  • 2.1.1
Description:

TopHat is a fast splice junction mapper for RNA-Seq reads. It aligns RNA-Seq reads to mammalian-sized genomes using the ultra high-throughput short read aligner Bowtie, and then analyzes the mapping results to identify splice junctions between exons. 

Compiled by:
Ali Siavosh-Haghighi
Compilation Notes:

Prerequisites: bowtie and samtools.
Go back to software main page.

Research Area:
Biology
Version:
  • 2.4.0 Released 02/05/2017
Description:

Trinity, developed at the Broad Institute and the Hebrew University of Jerusalem, represents a novel method for the efficient and robust de novo reconstruction of transcriptomes from RNA-seq data. Trinity combines three independent software modules: Inchworm, Chrysalis, and Butterfly, applied sequentially to process large volumes of RNA-seq reads.

Compiled by:
Garrett Crowe
Compilation Notes:

Trinity will run in SMP threaded model, i.e., parallel on a single node.

Load the Module:
module add trinity
Research Area:
Biology
License type:
GNU General public license (open source)
Version:
  • 5.4.4,
  • 5.4.1,
  • 5.4.1-vtst,
  • 4.6-vtst
Description:

The Vienna Ab-initio Simulation Package, better known as VASP, is a package for performing ab initio quantum mechanical molecular dynamics using either Vanderbilt pseudopotentials, or the projector augmented wave method, and a plane wave basis set.

Compiled by:
Shivraj Karewar
Research Area:
Chemistry
License type:
Individual license
Additional License Details:

The following groups have a license. jd0198 trc0020

Version:
  • 1.2.10
Description:

Velvet is a de novo genomic assembler specially designed for short read sequencing technologies, such as Solexa or 454, developed by Daniel Zerbino and Ewan Birney at the European Bioinformatics Institute (EMBL-EBI), near Cambridge, in the United Kingdom.

Velvet currently takes in short read sequences, removes errors then produces high quality unique contigs. It then uses paired-end read and long read information, when available, to retrieve the repeated areas between contigs.

Research Area:
Biology
License type:
GNU General public license (open source)
Version:
  • 2.12.2
Description:

VisIt is an Open Source, interactive, scalable, visualization, animation and analysis tool. From Unix, Windows or Mac workstations, users can interactively visualize and analyze data ranging in scale from small (<101 core) desktop-sized projects to large (>105 core) leadership-class computing facility simulation campaigns. Users can quickly generate visualizations, animate them through time, manipulate them with a

variety of operators and mathematical expressions, and save the resulting images and animations for presentations. VisIt contains a rich set of visualization features to enable users to view a wide variety of data including scalar and vector fields defined on two- and three-dimensional (2D and 3D) structured, adaptive and unstructured meshes. Owing to its customizeable plugin design, VisIt is capabable of visualizing data from over 120 different scientific data formats (see this partial list). [See a table of key features and a complete table of the tool's features.]

Research Area:
Computer Science
License type:
GNU General public license (open source)
VMD
Version:
  • 1.9.3
Description:

VMD is a molecular visualization program for displaying, animating, and analyzing large biomolecular systems using 3-D graphics and built-in scripting. VMD supports computers running MacOS X, Unix, or Windows, is distributed free of charge, and includes source code.

Research Area:
Biology
Version:
  • 4.5
Description:

Warp is an extensively developed open-source particle-in-cell code designed to simulate charged particle beams with high space-charge intensity. The name "Warp" stems from the code's ability to simulate Warped (bent) Cartesian meshes. This bent-mesh capability allows the code to efficiently simulate space-charge effects in bent accelerator lattices (resolution can be placed where needed) associated with rings and beam transfer lines with dipole bends. The code is set up around the interactive python interpreter with dynamically loaded compiled code modules.

Warp has a hierarchy of multi-species models ranging from full 3D, transverse slice x-y (including pz), and axisymmetric r-z (including ptheta), as well as simple envelope models useful for problem setup. Warp can operate in a boosted-frame mode. A broad variety of particle movers and field solvers are available. Particle movers include leap-frog models as well as gyro-kinetic models. Electrostatic and electromagnetic field solvers are included. Electrostatic field solvers include FFT, multi-grid, and multigrid with mesh refinement (both static and dynamic) options. The field solver work with bent (x-plane) meshes. A variety of conducting structures can be loaded on the grid with subgrid resolution and various boundary conditions can be employed. Particles can be scraped consistently with conducting structures and secondary particles (for e-cloud modeling) emitted. Electromagnetic field solvers are available in 3D, transverse x-y, and r-z packages.

Warp is also a plasma code and has been used for electron cloud and plasma modeling. Limited scattering and inelastic collision models are available, as well as models for particle interactions with surfaces.

Automated Submission:
#!/bin/bash # ###################################################################### # warpbatch.sh - given a directory of .py warp files, generates batch # # submission specific to /Talon2/ Univa Grid Engine. # # ################################################################### # ## # #local vars email="insert user unt email address between the quote marks" jobnum=0 jobs=`ls *.py` # create batch file for each .py warp file for f in $jobs do #create batch script echo -e "#!/bin/bash\n#\n#$ -cwd\n#$ -P acad\n#$ -q serial.q\n#$ -V\n#$ -m e\n#$ -M $email\n">warp_job.$jobnum.qsub #update batch script with job file staging echo "STORAGE_DIR=\"/storage/scratch2/\$USER/\$JOB_ID\"" >> warp_job.$jobnum.qsub echo "mkdir -pv \$STORAGE_DIR" >> warp_job.$jobnum.qsub echo "cp \$SGE_CWD_PATH/$f \$STORAGE_DIR" >> warp_job.$jobnum.qsub #update batch script with warp run info echo "time python $f > warp_job.$jobnum.out" >> warp_job.$jobnum.qsub echo "cp warp_job.$jobnum.out \$SGE_CWD_PATH" >> warp_job.$jobnum.qsub #submit it! submit=`qsub warp_job.$jobnum.qsub` #increment job number jobnum=`expr $jobnum + 1` done
Research Area:
Physics
Version:
  • 1.2.11
Description:

A massively spiffy yet delicately unobtrusive compression library.

Research Area:
Linux Utility
License type:
GNU General public license (open source)