Biowulf at the NIH
RSS Feed
FastQC on Biowulf

FastQC aims to provide a simple way to do some quality control checks on raw sequence data coming from high throughput sequencing pipelines. It provides a modular set of analyses which you can use to give a quick impression of whether your data has any problems of which you should be aware before doing any further analysis.

The main functions of FastQC are

FastQC is developed by Simon Andrews, Babraham Bioinformatics.

 

Make sure X-windows is running while connecting to helix.

The environment variable(s) need to be set properly first. The easiest way to do this is by using the modules commands as in the example below.

$ module avail fastqc
-------------------- /usr/local/Modules/3.2.9/modulefiles --------------------------------------
fastqc/0.10.0 fastqc/0.10.1(default) fastqc/0.9 $ module load fastqc $ module list Currently Loaded Modulefiles:
1) fastqc/0.10.1 $ module unload fastqc $ module load fastqc/0.9 $ module list Currently Loaded Modulefiles: 1) fastqc/0.9 $ module show fastqc ------------------------------------------------------------------- /usr/local/Modules/3.2.9/modulefiles/fastqc/0.10.1: module-whatis Sets up fastqc 0.10.1 prepend-path PATH /usr/local/apps/fastqc/0.10.1 -------------------------------------------------------------------

Running an batch job

User can submit fastqc job to a node through batch system.

First create a batch file alone the following lines:

#!/bin/bash
# This file is runfastqc
#
#PBS -N fastqc
#PBS -m be
#PBS -k oe

module load fastqc
fastqc -o output_dir [-f fastq|bam|sam] -c contaminant_file seqfile1 .. seqfileN

Then submit this batch file to the cluster:

biowulf$ qsub -l nodes=1:g8 /data/$USER/fastqc/runfastqc

User may change property of node (g8 in this case) in the qsub command to request other type of node. For example, if you need a node with 4gb of memory to run job :

[user@biowulf]$ qsub -l nodes=1:g4 runfastqc

 

Running an interactive job

User may need to run jobs interactively sometimes. Such jobs should not be run on the Biowulf login node. Instead allocate an interactive node as described below, and run the interactive job there.

[user@biowulf] $ qsub -I -l nodes=1:g24:c16
qsub: waiting for job 2236960.biobos to start
qsub: job 2236960.biobos ready

[user@pXXX]$ module load fastqc
[user@p4]$ cd /data/$USER/fastqc/run1
[user@pXXX]$ fastqc -o output_dir [-f fastq|bam|sam] -c contaminant_file seqfile1 .. seqfileN
[user@pXXX] exit
qsub: job 2236960.biobos completed
[user@biowulf]$

User may change property of node in the qsub command to request other type of node. For example, if you need a node with 8gb of memory to run job interactively, do this:

[user@biowulf]$ qsub -I -l nodes=1:g8

 

Documentation

http://www.bioinformatics.bbsrc.ac.uk/projects/fastqc/