Meme & Mast on Biowulf
Meme is designed to discover motifs (highly
conserved regions) in groups of related DNA or protein sequences, and Mast will
search sequence databases using motifs. Meme & Mast were developed at UCSD
and Purdue. Meme/Mast website.
Meme is cpu-intensive for large numbers of sequences or long sequences. Short jobs are most easily run on Helix, but if larger datasets are used, a parallel run on Biowulf is appropriate.
How to run Meme on Biowulf
Your input database should consist of a file containing sequences in fasta format. In the example below, the file is 'mini-drosoph.seqs'. Determine the number of characters in the file using 'wc -c filename' to use for the parameter 'maxsize'. Set up a batch script along the lines of the one below:------- this file is called meme.batch-----------------
Submit this script using
#!/bin/csh #PBS -N Meme #PBS -m be #PBS -j oe setenv PATH /usr/local/mpich/bin:$PATH cd /data/user/mydir/ mpirun -machinefile $PBS_NODEFILE -np $np /usr/local/meme/bin/meme_p \ mini-drosoph.seqs -dir /usr/local/meme/ -maxsize 500000 -text > mini-drosoph.meme mast mini-drosoph.meme -text
qsub -v np=32 -l nodes=16 meme.batch
Meme scales well, and large meme jobs (maxsize ~500,000) can be submitted on up to 128 processors.
Documentation
- Type 'meme' or 'mast' with no parameters on the command line to see a list of all available options and more information.
- Meme documentation at the SDSC website.
- Mast documentation at the SDSC website.