HMMER on Biowulf
Profile hidden Markov models for biological sequence analysis
Profile hidden Markov models (profile HMMs) can be used to do sensitive database searching using statistical descriptions of a sequence family's consensus. HMMER uses profile HMMs, and can be useful in situations like:- if you are working with an evolutionarily diverse protein family, a BLAST search with any individual sequence may not find the rest of the sequences in the family.
- the top hits in a BLAST search are hypothetical sequences from genome projects.
- your protein consists of several domains which are of different types.
HMMER (pronounced 'hammer', as in a more precise mining tool than BLAST) was developed by Sean Eddy at Washington University in St. Louis. The HMMER website is hmmer.janelia.org.
HMMER User Guide
(PDF)
HMMER is a very cpu-intensive program and is parallelized using threads, so that each instance of hmmpfam or hmmsearch can use all the cpus available on a node. HMMER on Biowulf is intended for those who need to run HMMER searches on large numbers of query sequences.
Searching query sequences against a profile HMM
database
One use of HMMER is to look for known domains in a query sequence, by searching
a single sequence against a library of HMMs. One such library is the PFAM database. PFAM is available and updated on
our systems in the directory /fdb/fastadb/pfam. It is also possible to
create your own database; see the user
guide for details).
Create a swarm command file with one line for each of the query sequences. Sample swarm command file:
---------------- file swarm.cmd ---------------------------------------------------- hmmpfam /fdb/fastadb/pfam/Pfam_fs /data/user/seqs/myseq1 > /data/user/out/seq1.out hmmpfam /fdb/fastadb/pfam/Pfam_fs /data/user/seqs/myseq2 > /data/user/out/seq2.out hmmpfam /fdb/fastadb/pfam/Pfam_fs /data/user/seqs/myseq3 > /data/user/out/seq3.out hmmpfam /fdb/fastadb/pfam/Pfam_fs /data/user/seqs/myseq4 > /data/user/out/seq4.out hmmpfam /fdb/fastadb/pfam/Pfam_fs /data/user/seqs/myseq5 > /data/user/out/seq5.out [....] ------------------------------------------------------------------------------------
swarm -f swarm.cmd -n 1
Searching a sequence database for homologues of a protein
family
Another common use of HMMER is to search a sequence database for homologues of
a protein family of interest. If you start with a file containing several
sequences belonging to the family, you can use this to find remote homologues
from a protein database. The following sample batch script will run hmmbuild,
hmmcalibrate, and hmmsearch in sequence.
----------- file hmm_homolog ----------------------------------------- #!/bin/csh #PBS -N Hmmer #PBS -m be #PBS -k oe cd /data/user/mydir hmmbuild -g globins.hmm globins.msf hmmcalibrate globins.hmm hmmsearch globins.hmm /fdb/fastadb/ecoli.aa.fas ------------------------------------------------------------------------This script starts with a multiple sequence alignment of a protein domain or protein family in the file globins.msf. This file can be created by aligning sequences with ClustalW. The hmmbuild command builds a profile HMM from the alignment, the hmmcalibrate command increases the sensitivity of the search, and the hmmsearch command uses the globin model to search for globin domains in the Ecoli database. See the HMMER documentation for more information.
Submit this file with:
qsub -l nodes=1 hmm_homolog
More information
The entire HMMER suite of programs is available in /usr/local/hmmer. Note that
only hmmcalibrate, hmmsearch and hmmpfam are parallelized.
A large collection of protein sequence databases is in
/fdb/fastadb/.
Fasta-format
databases and update status.