SRA-Toolkit on Biowulf

		Search:

The NCBI SRA SDK generates loading and dumping tools with their respective libraries for building new and accessing existing runs.

The environment variable(s) need to be set properly first. The easiest way to do this is by using the modules commands as in the example below.

$ module avail sratoolkit
------------------------ /usr/local/Modules/3.2.9/modulefiles----------------------------
sratoolkit/2.0rc5         sratoolkit/2.1.10         sratoolkit/2.1.15         sratoolkit/2.1.16         sratoolkit/2.1.7          sratoolkit/2.2.2b(default)


$ module load sratoolkit

$ module list
Currently Loaded Modulefiles:
  1) sratoolkit/2.2.2b

$ module unload sratoolkit

$ module load sratookit/2.1.10

$ module show sratoolkit
-------------------------------------------------------------------
/usr/local/Modules/3.2.9/modulefiles/sratoolkit/2.2.2b:

module-whatis    Sets up sra toolkit 2.2.2b and decription 2.2.4
prepend-path     PATH /usr/local/apps/sratoolkit/2.2.2b/bin
prepend-path     PATH /usr/local/apps/sratoolkit/decryption.2.2.4/bin
-----------------------------------------------------------------

Submitting a single batch job

1. Create a script file. The file will contain the lines similar to the lines below. Modify the path of program location before running.

#!/bin/bash
# This file is YourOwnFileName
#
#PBS -N yourownfilename
#PBS -m be
#PBS -k oe

module load sratoolkit

cd /data/user/somewhereWithInputFile
fastq-dump some.csra
sam-dump some.csra > my_sam.sam
....
....

2. Submit the script using the 'qsub' command on Biowulf

$ qsub -l nodes=1 /data/$USER/theScriptFileAbove

Submitting a swarm of jobs

Using the 'swarm' utility, one can submit many jobs to the cluster to run concurrently.

Set up a swarm command file (eg /data/username/cmdfile). Here is a sample file:

 
module load sratoolkit; fastq-dump --aligned --table PRIMARY_ALIGNMENT -O /data/$USER/mydir
module load sratoolkit; fastq-dump --aligned --table SEQUENCE -O /data/$USER/mydir2
[....]

This swarm command file can be submitted with:

$ swarm -f cmdfile

Submitting with this command will mean that swarm will use the default 1 GB of memory per process (a process would be a single line in the command file above).

For more information regarding running swarm, see swarm.html

Running an interactive job

User may need to run jobs interactively sometimes. Such jobs should not be run on the Biowulf login node. Instead allocate an interactive node as described below, and run the interactive job there.

$ qsub -I -l nodes=1
qsub: waiting for job 2236960.biobos to start
qsub: job 2236960.biobos ready 
$ cd /data/$USER/myruns
$ module load sratoolkit
$ cd /data/$USER/run1
$ fastq-dump .... 
# illumina-dump....
$ exit
qsub: job 2236960.biobos completed
$

If you want a specific type of node (e.g. one with 8 GB of memory), you can specify that on the qsub command line. e.g.

$ qsub -I -l nodes=1:g8

Documentation

SRA toolkit documentation at NCBI