FastQC aims to provide a simple way to do some quality control checks on raw sequence data coming from high throughput sequencing pipelines. It provides a modular set of analyses which you can use to give a quick impression of whether your data has any problems of which you should be aware before doing any further analysis.
The main functions of FastQC are
- Import of data from BAM, SAM or FastQ files (any variant)
- Providing a quick overview to tell you in which areas there may be problems
- Summary graphs and tables to quickly assess your data
- Export of results to an HTML based permanent report
- Offline operation to allow automated generation of reports without running the interactive application
FastQC is developed by Simon Andrews, Babraham Bioinformatics.
Make sure X-windows is running while connecting to helix.
The environment variable(s) need to be set properly first. The easiest way to do this is by using the modules commands as in the example below.
$ module avail fastqc -------------------- /usr/local/Modules/3.2.9/modulefiles --------------------------------------
fastqc/0.10.0 fastqc/0.10.1(default) fastqc/0.9 $ module load fastqc $ module list Currently Loaded Modulefiles:
1) fastqc/0.10.1 $ module unload fastqc $ module load fastqc/0.9 $ module list Currently Loaded Modulefiles: 1) fastqc/0.9 $ module show fastqc ------------------------------------------------------------------- /usr/local/Modules/3.2.9/modulefiles/fastqc/0.10.1: module-whatis Sets up fastqc 0.10.1 prepend-path PATH /usr/local/apps/fastqc/0.10.1 -------------------------------------------------------------------
User can submit fastqc job to a node through batch system.
First create a batch file alone the following lines:
#!/bin/bash
# This file is runfastqc
#
#PBS -N fastqc
#PBS -m be
#PBS -k oe
module load fastqc
fastqc -o output_dir [-f fastq|bam|sam] -c contaminant_file seqfile1 .. seqfileN
Then submit this batch file to the cluster:
User may change property of node (g8 in this case) in the qsub command to request other type of node. For example, if you need a node with 4gb of memory to run job :
User may need to run jobs interactively sometimes. Such jobs should not be run on the Biowulf login node. Instead allocate an interactive node as described below, and run the interactive job there.
qsub: waiting for job 2236960.biobos to start
qsub: job 2236960.biobos ready
[user@pXXX]$ module load fastqc
[user@p4]$ cd /data/$USER/fastqc/run1
[user@pXXX]$ fastqc -o output_dir [-f fastq|bam|sam] -c contaminant_file seqfile1 .. seqfileN
[user@pXXX] exit
qsub: job 2236960.biobos completed
[user@biowulf]$
User may change property of node in the qsub command to request other type of node. For example, if you need a node with 8gb of memory to run job interactively, do this: