WU-Blast on Helix
WU-Blast
performs sensitive, selective and rapid similarity searches of protein
and nucleotide sequence databases. WU-BLAST 2.0 builds upon WU-BLAST-1.4,
which in turn was based on the public domain NCBI-BLAST version 1.4.
It was developed by Warren Gish at Washington University. [WU-Blast
website].
WU-Blast with large numbers of sequences (>100) may be most suitable for the Biowulf cluster. Contact the Helix Systems staff (staff@helix.nih.gov) if you have questions about running Wublast.
The Wu-Blast family of programs includes:
- blastp compares an amino acid query sequence against a protein sequence database
- blastn compares a nucleotide query sequence against a nucleotide sequence database
- blastx compares a nucleotide query sequence translated in all reading frames against a protein sequence database
- tblastn compares a protein query sequence against a nucleotide sequence database dynamically translated in all reading frames
- tblastx compares the six-frame translations of a nucleotide query sequence against the six-frame translations of a nucleotide sequence database.
Update status of WU-Blast databases on Helix
Version
The WU-BLAST version is printed at the top of every Wu-Blast output.Sample session: (user input in bold):
helix% wublast WU-BLAST searches for sequences similar to a query sequence. The query and the database searched can be either peptide or nucleic acid in any combination. Search with what query sequence? cram_craab.fas Your query sequence is a protein sequence. Available programs are: blastp - protein query sequence against protein database tblastn - protein query sequence against a nucleotide database translated in all 6 reading frames Which program do you want to run (blastp)? tblastn The following nucleotide databases are available: (or enter your own database with full pathname) 1) nt - all nonredundant Genbank+EMBL+DDBJ+PDB (no EST, STS, GSS or HTG) 2) est_human - nonredundant Genbank+EMBL+DDBJ EST human sequences 3) est_mouse - nonredundant Genbank+EMBL+DDBJ EST mouse sequences 4) pdb.nt - from the 3-dimensional structures 5) ecoli.nt - ecoli genomic sequences 6) mito.nt - mitochondrial sequences 7) yeast.nt - yeast (Saccharomyces cerevisiae) genomic sequences 8) drosoph.nt - drosophila sequences 9) hs_genome - human genome assembly (Build 35, May 2004) 10) hs_genome.rna - human genome RNA (Build 35, May 2004) 11) mouse_genome - mouse genome assembly (Build 33, June 2004) 12) mouse_genome.rna - mouse genome RNA (Build 33, June 2004) 13) ref.human.rna - Refseq Human RNA 14) ref.mouse.rna - RefSeq Mouse RNA which database (1)? 1 Use NCBI-Blast parameters? [n]: Any additional WUBlast parameters (e.g. -E=1.0 -V=10 -B=10): What should I call the output file (cram_craab.tblastn) ? ----------------------------------------------------------------------------- Running command as follows: /usr/local/wublast/x86_64/tblastn /fdb/wublastdb/nt cram_craab.fas -o cram_craab.tblastn WARNING: Use of the hspsepSmax parameter should be considered with long database sequences, to improve the biological relevance of the HSP groups that are assembled and to improve the statistical discrimination of these groups from random background. WARNING: hspmax=1000 was exceeded by 407 of the database sequences, causing the associated cutoff score, S2, to be transiently set as high as 37. helix%
Sample output
TBLASTN 2.0MP-WashU [04-May-2006] [linux26-x64-I32LPF64 2006-05-10T17:22:28] Copyright (C) 1996-2006 Washington University, Saint Louis, Missouri USA. All Rights Reserved. Reference: Gish, W. (1996-2006) http://blast.wustl.edu Query= CRAM_CRAAB, 46 aa. (46 letters) Database: All Non-redundant GenBank+EMBL+DDBJ+PDB (but no EST, STS, GSS, HTG) built on Fri Mar 14 20:58:19 2008 6,546,745 sequences; 23,125,808,238 total letters. Searching....10....20....30....40....50....60....70....80....90....100% done Smallest Sum Reading High Probability Sequences producing High-scoring Segment Pairs: Frame Score P(N) N emb|X81709.1|TGTHI14 T.gesneriana Thi1-4 mRNA for thionin... +2 149 8.4e-08 1 dbj|AB072338.1| Avena sativa mRNA for leaf thionin Asthi1... +3 140 6.3e-07 1 dbj|AB072339.1| Avena sativa mRNA for leaf thionin Asthi2... +2 134 2.8e-06 1 dbj|AB072340.1| Avena sativa mRNA for leaf thionin Asthi3... +3 129 1.1e-05 1 [...] >emb|X81709.1|TGTHI14 T.gesneriana Thi1-4 mRNA for thionin class 1 Length = 535 Score = 149 (57.5 bits), Expect = 8.4e-08, P = 8.4e-08 Identities = 25/43 (58%), Positives = 29/43 (67%), Frame = +2 Query: 2 TCCPSIVARSNFNVCRLPGTPEALCATYTGCIIIPGATCPGDY 44 +CCPS AR+ +NVCR PGTP +CA GC II G CP DY Sbjct: 41 SCCPSTAARNCYNVCRFPGTPRPVCAATCGCKIITGTKCPPDY 169 >dbj|AB072338.1| Avena sativa mRNA for leaf thionin Asthi1, complete cds Length = 677 Score = 140 (54.3 bits), Expect = 6.3e-07, P = 6.3e-07 Identities = 24/43 (55%), Positives = 30/43 (69%), Frame = +3 Query: 2 TCCPSIVARSNFNVCRLPGTPEALCATYTGCIIIPGATCPGDY 44 +CC I+AR+ +NVCR+PGTP +CAT C II G CP DY Sbjct: 111 SCCKDIMARNCYNVCRIPGTPRPVCATTCRCKIISGNKCPKDY 239 [...]
Documentation
- WU-BLAST website. Warren Gish's web page about using Wublast.
- Typing the name of the program (with full path) without any parameters will give a list of all available options for that program. e.g.
helix% /usr/local/wublast/x86_64/tblastn TBLASTN 2.0MP-WashU [04-May-2006] [linux26-x64-I32LPF64 2006-05-10T17:22:28] Copyright (C) 1996-2006 Washington University, Saint Louis, Missouri USA. All Rights Reserved. Reference: Gish, W. (1996-2006) http://blast.wustl.edu Usage: TBLASTN database queryfile [options] Valid TBLASTN options: E, S, E2, S2, W, T, X, M, C, Y, Z, L, K, H, V and B -matrix
use the specified scoring matrix (default BLOSUM62); be sure to consider changing the default gap penalties when using a non-default scoring matrix -Q <s> penalty score for a gap of length 1 -R <s> penalty score for extending a gap by each letter after the first -kap use Karlin-Altschul statistics on individual alignment scores [...] - WU-BLAST on Biowulf for processing large numbers of sequences.