Blast

Blast on Helix

Blast, developed at NCBI, NIH, is a rapid sequence database search algorithm that compares a query nucleotide or protein sequence against a database. Most users who run occasional searches will use Blast on helix. The standalone blast on Helix searches against Helix local databases which are updated weekly.

Those who need to Blast large numbers of sequences (100s, 1000s, or more) can use Blast on Biowulf, which is designed for high-throughput Blast runs. If you have questions about where to run your Blast searches, please contact the Helix Systems staff at staff@helix.nih.gov.

The Blast family of programs includes:

blastp compares an amino acid query sequence against a protein sequence database
blastn compares a nucleotide query sequence against a nucleotide sequence database
blastx compares a nucleotide query sequence translated in all reading frames against a protein sequence database
tblastn compares a protein query sequence against a nucleotide sequence database dynamically translated in all reading frames
tblastx compares the six-frame translations of a nucleotide query sequence against the six-frame translations of a nucleotide sequence database.
megablast is specifically designed to efficiently find long alignments between very similar sequences and thus is the best tool to use to find the identical match to your query sequence.
blastpgp Position-Specific Iterated (PSI)-BLAST is the most sensitive BLAST program, making it useful for finding very distantly related proteins or new members of a protein family.
rpsblast Reverse Position Specific BLAST (RPS-BLAST) is a more sensitive way of identifying conserved domains in proteins than standard BLAST searching.

Version

The Blast version is printed at the top of every Blast output.

Running Blast on Helix

Blast on Helix runs against the local databases which are updated and mainted by the Helix Systems staff. To run any of the Blast programs, type blast at the Helix prompt. The script will prompt you for all required parameters.
[Update status of databases]

Sample session (user input is in bold):

[helix]$ blast

 BLAST searches for sequences similar to a query sequence. The query and the
 database searched can be either peptide or nucleic acid in any combination.

 Usage: blast [QuerySeq] [-default]

Search with what query sequence? myseq.fas

Your query sequence is a nucleotide sequence. Available programs are:
    blastn  -  nucleotide query against nucleotide db
    blastx  -  translated nucleotide query against protein db
    tblastx -  translated nucleotide query against translated db

Which program do you want to run (blastn)? blastn

Available nucleotide sequence databases:

  1)Drosophila        Drosophila sequences
  2)E.Coli            E.Coli sequences
  3)EST               All non-redundant GenBank+EMBL+DDBJ EST sequences
  4)EST - human       Non-redundant GenBank+EMBL+DDBJ Human EST sequences
  5)EST - mouse       Non-redundant GenBank+EMBL+DDBJ Mouse EST sequences
  6)EST - others      Non-redundant GenBank+EMBL+DDBJ Other (not human or mouse) EST sequences
  7)HTGs              High throughput genome sequences
  8)Human Genome      Build 35, hg17 (May 2004) from the International Human Genome Consortium
  9)Human Genome RNA  Build 35, hg17, May 2004
 10)Mito              Mitochondrial sequences
 11)Mouse Genome      Build 34, mm6, May 2005 from the Mouse Genome Consortium
 12)Mouse Genome RNA  Build 34, mm6, May 2005 from the Mouse Genome Consortium
 13)NCBI nt           All Non-redundant GenBank+EMBL+DDBJ+PDB (but no EST, STS, GSS, HTG)
 14)Other genomic     Non-human, non-mouse genomic sequences from NCBI
 15)Protein Data Bank An archive of experimentally determined 3D structures of biological macromolecules
 16)Refseq Human RNA  A comprehensive, integrated, non-redundant set of sequences
 17)Refseq Mouse RNA  A comprehensive, integrated, non-redundant set of sequences
 18)UniVec Core       A non-redundant db of sequences commonly attached to cDNA or genomic DNA during cloning
 19)Yeast             Yeast Sequences

which database (13)? 10

Ignore hits expected to occur by chance more than (* 10.0 *) times ? 

Limit the number of sequences in my output to (* 250 *) ? 10

Other options (-check to see all the options)(eg. -A=30 -n=T)(no)? 

What should I call the output file (blastn) ? 5n.blastn
-----------------------------------------------------------------------------
Running command as follows:
/usr/local/blast/ncbi/bin//blastall -p blastn -d /fdb/blastdb/mito.nt \ 
    -i /data/susanc/blast/bench/5n -o 5n.blastn -e 10.0 -b 10 -a 4

[helix]$ more 5n.blastn
BLASTN 2.2.17 [Aug-26-2007]

Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer, 
Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), 
"Gapped BLAST and PSI-BLAST: a new generation of protein database search
programs",  Nucleic Acids Res. 25:3389-3402.

Query= gi|615762|gb|T33664.1|T33664 EST58651 Human Brain Homo sapiens
cDNA 3' end similar to None
         (259 letters)

Database: Mitochondrial nucleotide sequences 
           129 sequences; 3,164,247 total letters

Searching..................................................done



                                                                 Score    E
Sequences producing significant alignments:                      (bits) Value

emb|Y08502.1|MIATGENB A.thaliana mitochondrial genome, part B          26   9.6  
gb|U03843.1|CRU03843 Chlamydomonas reinhardtii complete mitochon...    26   9.6  
ref|NC_000884.1| Cavia porcellus complete mitochondrial genome         26   9.6  
ref|NC_000877.1| Aythya americana mitochondrion, complete genome       26   9.6  
ref|NC_000861.1| Salvelinus alpinus mitochondrion, complete genome     26   9.6  
[...]

Documentation

Short course and program selection guide at the NCBI website.
The statistics of sequence similarity scores. A tutorial by Stephen Altschul at the NCBI website.

Helix Systems Scientific Supercomputing at the NIH