Blast on Helix
Blast, developed at NCBI, NIH,
is a rapid sequence database search algorithm that compares a query
nucleotide or protein sequence against a database. Most users who run
occasional searches will use Blast on helix. The standalone
blast on Helix searches against Helix local databases which
are updated weekly.
Those who need to Blast large numbers of sequences (100s, 1000s, or more) can use Blast on Biowulf, which is designed for high-throughput Blast runs. If you have questions about where to run your Blast searches, please contact the Helix Systems staff at staff@helix.nih.gov.
The Blast family of programs includes:
- blastp compares an amino acid query sequence against a protein sequence database
- blastn compares a nucleotide query sequence against a nucleotide sequence database
- blastx compares a nucleotide query sequence translated in all reading frames against a protein sequence database
- tblastn compares a protein query sequence against a nucleotide sequence database dynamically translated in all reading frames
- tblastx compares the six-frame translations of a nucleotide query sequence against the six-frame translations of a nucleotide sequence database.
- megablast is specifically designed to efficiently find long alignments between very similar sequences and thus is the best tool to use to find the identical match to your query sequence.
- blastpgp Position-Specific Iterated (PSI)-BLAST is the most sensitive BLAST program, making it useful for finding very distantly related proteins or new members of a protein family.
- rpsblast Reverse Position Specific BLAST (RPS-BLAST) is a more sensitive way of identifying conserved domains in proteins than standard BLAST searching.
Version
The Blast version is printed at the top of every Blast output.Running Blast on Helix
Blast on Helix runs against the local databases which are updated and mainted by the Helix Systems staff. To run any of the Blast programs, type blast at the Helix prompt. The script will prompt you for all required parameters.[Update status of databases]
Sample session (user input is in bold):
[helix]$ blast BLAST searches for sequences similar to a query sequence. The query and the database searched can be either peptide or nucleic acid in any combination. Usage: blast [QuerySeq] [-default] Search with what query sequence? myseq.fas Your query sequence is a nucleotide sequence. Available programs are: blastn - nucleotide query against nucleotide db blastx - translated nucleotide query against protein db tblastx - translated nucleotide query against translated db Which program do you want to run (blastn)? blastn Available nucleotide sequence databases: 1)Drosophila Drosophila sequences 2)E.Coli E.Coli sequences 3)EST All non-redundant GenBank+EMBL+DDBJ EST sequences 4)EST - human Non-redundant GenBank+EMBL+DDBJ Human EST sequences 5)EST - mouse Non-redundant GenBank+EMBL+DDBJ Mouse EST sequences 6)EST - others Non-redundant GenBank+EMBL+DDBJ Other (not human or mouse) EST sequences 7)HTGs High throughput genome sequences 8)Human Genome Build 35, hg17 (May 2004) from the International Human Genome Consortium 9)Human Genome RNA Build 35, hg17, May 2004 10)Mito Mitochondrial sequences 11)Mouse Genome Build 34, mm6, May 2005 from the Mouse Genome Consortium 12)Mouse Genome RNA Build 34, mm6, May 2005 from the Mouse Genome Consortium 13)NCBI nt All Non-redundant GenBank+EMBL+DDBJ+PDB (but no EST, STS, GSS, HTG) 14)Other genomic Non-human, non-mouse genomic sequences from NCBI 15)Protein Data Bank An archive of experimentally determined 3D structures of biological macromolecules 16)Refseq Human RNA A comprehensive, integrated, non-redundant set of sequences 17)Refseq Mouse RNA A comprehensive, integrated, non-redundant set of sequences 18)UniVec Core A non-redundant db of sequences commonly attached to cDNA or genomic DNA during cloning 19)Yeast Yeast Sequences which database (13)? 10 Ignore hits expected to occur by chance more than (* 10.0 *) times ? Limit the number of sequences in my output to (* 250 *) ? 10 Other options (-check to see all the options)(eg. -A=30 -n=T)(no)? What should I call the output file (blastn) ? 5n.blastn ----------------------------------------------------------------------------- Running command as follows: /usr/local/blast/ncbi/bin//blastall -p blastn -d /fdb/blastdb/mito.nt \ -i /data/susanc/blast/bench/5n -o 5n.blastn -e 10.0 -b 10 -a 4 [helix]$ more 5n.blastn BLASTN 2.2.17 [Aug-26-2007] Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), "Gapped BLAST and PSI-BLAST: a new generation of protein database search programs", Nucleic Acids Res. 25:3389-3402. Query= gi|615762|gb|T33664.1|T33664 EST58651 Human Brain Homo sapiens cDNA 3' end similar to None (259 letters) Database: Mitochondrial nucleotide sequences 129 sequences; 3,164,247 total letters Searching..................................................done Score E Sequences producing significant alignments: (bits) Value emb|Y08502.1|MIATGENB A.thaliana mitochondrial genome, part B 26 9.6 gb|U03843.1|CRU03843 Chlamydomonas reinhardtii complete mitochon... 26 9.6 ref|NC_000884.1| Cavia porcellus complete mitochondrial genome 26 9.6 ref|NC_000877.1| Aythya americana mitochondrion, complete genome 26 9.6 ref|NC_000861.1| Salvelinus alpinus mitochondrion, complete genome 26 9.6 [...]
Documentation
- Short course and program selection guide at the NCBI website.
- The statistics of sequence similarity scores. A tutorial by Stephen Altschul at the NCBI website.