Accurate mapping of RNA-seq reads for splice junction discovery.
MapSplice was developed at the U. Kentucky Bioinformatics lab.
The environment variable(s) need to be set properly first. The easiest way to do this is by using the modules commands, 'module load mapsplice' , as in the example below.
biowulf% module avail mapsplice -------------------- /usr/local/Modules/3.2.9/modulefiles --------------------- mapsplice/1.15.2 biowulf% module load mapsplice biowulf% module list Currently Loaded Modulefiles: 1) mapsplice/1.15.2
1. You will either need to copy the MapSplice configuration file and edit it for your own needs, or put all the options on the command line. There are several sample configuration files in /usr/local/mapsplice/. Copy one of them and edit the appropriate sections to define the input files, reference genome, bowtie indexes etc.
cp /usr/local/mapsplice/paired.cfg /data/user/mydir
2. Create a batch script file similar to the one below:
#!/bin/bash # This file is YourOwnFileName # #PBS -N yourownfilename #PBS -m be #PBS -k oe module load mapsplice cd /data/user/mydir python $MSBIN/mapsplice_segments.py Run1.cfg
2. Submit the script using the 'qsub' command on Biowulf, e.g.
Using the 'swarm' utility, one can submit many jobs to the cluster to run concurrently.
Set up a swarm command file (eg /data/username/cmdfile). Here is a sample file:
cd /data/user/mydir1; python $MSBIN/mapsplice_segments.py Run1.cfg cd /data/user/mydir1; python $MSBIN/mapsplice_segments.py Run2.cfg cd /data/user/mydir1; python $MSBIN/mapsplice_segments.py Run3.cfg [...]
Submit this job with
swarm -f cmdfile --module mapsplice
By default, each line of the command file above will run on one core of a node using up to 1 GB of memory. The bowtie section of MapSplice can run in multi-threaded mode. If you specify more than 1 thread for Bowtie (either using '-X #', or '--threads #', or setting 'threads=#' in the .cfg file), then you must tell swarm how many threads each command will use using the '-t #' flag to swarm.. For example, if you set '--threads 8', then you should submit swarm with:
swarm -t 8 -f cmdfile --module mapsplice
If each command requires more than 1 GB of memory, you must tell swarm the amount of memory required using the '-g #' flag to swarm. For example, if each mapsplice command (a single line in the file above) requires 10 GB of memory and you are running with 8 threads, you would submit the swarm with:
swarm -g 10 -t 8 -f cmdfile --module mapsplice
For more information regarding running swarm, see swarm.html
Users may need to run jobs interactively sometimes. Such jobs should not be run on the Biowulf login node. Instead allocate an interactive node as described below, and run the interactive job there.
qsub: waiting for job 2236960.biobos to start
qsub: job 2236960.biobos ready
[user@p4]$ cd /data/user/myruns
[user@p4]$ module load mapsplice
[user@p4]$ cd /data/userID/mapsplice/run1
[user@p4]$ python $MSBIN/mapsplice_segments.py -Q fq -o output_path -u file1.fastq -c /fdb/genome/hg19/chr_all.fa -b /usr/local/bowtie-indexes --threads 4 -L 18 2 > output.log
[user@p4] exit
qsub: job 2236960.biobos completed
[user@biowulf]$
You may add a node property in the qsub command to request specific interactive node. For example, if you need a node with 24gb of memory to run job interactively, do this: