The NCBI SRA SDK generates loading and dumping tools with their respective libraries for building new and accessing existing runs.
The environment variable(s) need to be set properly first. The easiest way to do this is by using the modules commands as in the example below.
$ module avail sratoolkit ------------------------ /usr/local/Modules/3.2.9/modulefiles---------------------------- sratoolkit/2.0rc5 sratoolkit/2.1.10 sratoolkit/2.1.15 sratoolkit/2.1.16 sratoolkit/2.1.7 sratoolkit/2.2.2b(default) $ module load sratoolkit $ module list Currently Loaded Modulefiles:
1) sratoolkit/2.2.2b $ module unload sratoolkit $ module load sratookit/2.1.10 $ module show sratoolkit ------------------------------------------------------------------- /usr/local/Modules/3.2.9/modulefiles/sratoolkit/2.2.2b: module-whatis Sets up sra toolkit 2.2.2b and decription 2.2.4 prepend-path PATH /usr/local/apps/sratoolkit/2.2.2b/bin prepend-path PATH /usr/local/apps/sratoolkit/decryption.2.2.4/bin -----------------------------------------------------------------
1. Create a script file. The file will contain the lines similar to the lines below. Modify the path of program location before running.
#!/bin/bash # This file is YourOwnFileName # #PBS -N yourownfilename #PBS -m be #PBS -k oe module load sratoolkit cd /data/user/somewhereWithInputFile fastq-dump some.csra sam-dump some.csra > my_sam.sam .... ....
2. Submit the script using the 'qsub' command on Biowulf
Using the 'swarm' utility, one can submit many jobs to the cluster to run concurrently.
Set up a swarm command file (eg /data/username/cmdfile). Here is a sample file:
module load sratoolkit; fastq-dump --aligned --table PRIMARY_ALIGNMENT -O /data/$USER/mydir module load sratoolkit; fastq-dump --aligned --table SEQUENCE -O /data/$USER/mydir2 [....]
This swarm command file can be submitted with:
$ swarm -f cmdfile
For more information regarding running swarm, see swarm.html
User may need to run jobs interactively sometimes. Such jobs should not be run on the Biowulf login node. Instead allocate an interactive node as described below, and run the interactive job there.
$ qsub -I -l nodes=1 qsub: waiting for job 2236960.biobos to start qsub: job 2236960.biobos ready $ cd /data/$USER/myruns $ module load sratoolkit $ cd /data/$USER/run1 $ fastq-dump .... # illumina-dump.... $ exit qsub: job 2236960.biobos completed $
If you want a specific type of node (e.g. one with 8 GB of memory), you can specify that on the qsub command line. e.g.
$ qsub -I -l nodes=1:g8
SRA toolkit documentation at NCBI