spacer
spacer

EBI Dbfetch


Database
Image
Search Items
Image
Format
Image
Style
Image

Image

Upload File
Image
Format
Image
Style
Image

Image

Dbfetch Help


What is dbfetch?

Dbfetch is an acronym for "database fetch". Dbfetch provides an easy way to retrieve entries from various databases at the EBI in a consistent manner. It can be used from any browser as well as well as within a web-aware scripting tool that uses wget, lynx or similar.

How to use dbfetch?
  • From the browser

    Follow these instructions...

    1. Selecting a database:
      -If you are using the first form to paste your search items: choose a database name from this form...more
      -If you are using the second form to upload your search items: the database name will be included at the beginning of each line line of the upload file followed by a colon...more

    2. Entering your search terms:
      These MUST BE in the appropriate database format, up to 200 search items can be queried in one run.
      -If you are using the first form: seperate search items with a comma or space...more
      -If you are using the second form: seperate search items with a new line...more

    3. Choosing an output format:
      Here you can choose the simpler fasta format, or the databases' default format for the chosen database...more

    4. Style:
      You can get your results as text, ot html...more

    5. Retrieve!
      You are now ready to fetch your results, by pressing the Retrieve button.

    Search Items

    You may enter up to 200 search items for your chosen database.

    Multiple search terms should be sepatated by EITHER a space OR a comma.

    e.g. EMBL

    "
    AE014292,AE017197,AE017354 " or "AE014292 AE017197 AE017354"

    e.g. UniProtKB

    "
    1433X_MAIZE,1433T_RAT,ACR2_YEAST" or "1433X_MAIZE 1433T_RAT ACR2_YEAST"

    Upload File

    Here you may upload a file in the specified format.

    You may retrieve up to 200 entries.

    Entries in the uploaded file all need to belong to the same database.

    Each entry you wish to retrieve MUST be on a new line and in the format: "database name":"id"

    e.g. EMBL
         embl:AE014292
         embl:AE017197
         embl:AE017354

    e.g. UniProtKB
         uniprotkb:A1AG1_HUMAN
         uniprotkb:A1AT_PIG
         uniprotkb:ACR2_YEAST

    Supported Databases and Database ID Format

    Currently supported databases include EMBL, UniProtKB, PDB, Medline, RefSeq, HVGBase, EMBLSVA, UniRef100, UniRef90, UniRef50 and UniParc.

    Output Format

    The sequence/database format of the results. Sequence formats are simply the way in which the amino acid or DNA sequence is recorded in a computer file. Different programs expect different formats, so if you are to submit a job successfully, it is important to understand what the various formats look like. To learm more about sequence formats, please see the 2Can Support Portal at http://www.ebi.ac.uk/2can/tutorials/formats.html

    The default is often the default format for the specified database, which can also can be selected in the drop down list.

    e.g. The default for the UniProt Database is SWISS format.

    See an example of this format at http://www.ebi.ac.uk/2can/tutorials/formats.html#swiss

    Fasta format contains a one line header followed by lines of sequence data. Sequences in fasta formatted files are preceded by a line starting with a " >" symbol. The first word on this line is the name of the sequence. The rest of the line is a description of the sequence.

    e.g.

    >uniprot|P13346|FOSB_MOUSE Protein fosB. 
    MFQAFPGDYDSGSRCSSSPSAESQY
    Learn more about this format at http://www.ebi.ac.uk/2can/tutorials/formats.html#fasta

    If no format is specified, as in an http request, the default format will be used.

    EMBL XML formats , there are two XML formats availabe from the EMBL database.
    EMBLXML : XML format for the EMBL nucleotide sequence database , developed internally.
    INSDXML : XML format for EMBL the nucleotide sequence database, developed in collaboration with NCBI(genbank) and DDBJ.
    DTD for INSDXML and the DTD/XML Schema for EBMLXML can be found here.


    Style

    The results can either be delivered as raw text or as html, specify here which style you prefer.

    The default style is html.

  • from within a script - examples of the url for all styles and formats

    For people interested in programmatic access to the Dbfetch functionality, we recommend using our new Web Services version of Dbfetch: WSDbfetch.

    Alternatively, you can use dbfetch for direct access:

    Making scripted http requests to dbfetch is very simple, the parameters which can be used are db, id, format and style. Of these parameters only db and id are required fields. When omitting to use format and/or style, the defaults for the chosen database will be used (the default style is always html).

    The URL to dbfetch is always of this format:
    http://www.ebi.ac.uk/cgi-bin/dbfetch?db=DB_NAME&id=IDS&format=FORMAT_NAME&style=STYLE_NAME

    • DB_NAME - Must be chosen from the table below
    • IDS - Single id/acc or comma/white-space separated list (id1 or id1 id2 id3 or id1,id2,id3)
    • FORMAT_NAME - Name of the output format, varies between databases
    • STYLE_NAME - Name of the output style, available styles are raw and html

    The available databases are:

    NAME KEY(use this)
    EMBL embl
    EMBLCON emblcon
    EMBLCDS emblcds
    EMBLSVA emblsva
    UniProtKB uniprotkb
    UniSave unisave
    UniRef100 uniref100
    UniRef90 uniref90
    UniRef50 uniref50
    UniParc uniparc
    IPI ipi
    RefSeq refseq
    InterPro interpro
    PDB pdb
    HGVbase hgvbase
    GenomeReviews genomereviews
    EPO Proteins epo_prt
    JPO Proteins jpo_prt
    KIPO Proteins kipo_prt
    USPO Proteins uspto_prt
    Medline
    medline
    Ensembl Gene ensemblgene
    Ensembl Transcript ensembltranscript


    To find the available format and styles for the different databases, select the database in the search form and the format and style drop down will contain the available options.

    Examples:

    http://www.ebi.ac.uk/cgi-bin/dbfetch?db=EMBL&id=J00231,HSFOS,ROD894,LOP242600

    Instead of the default raw (plain ASCII) style, entries can also be retireved in plain text (raw):
    http://www.ebi.ac.uk/cgi-bin/dbfetch?db=EMBL&id=J00231,HSFOS,ROD894,LOP242600&style=raw

    It is also possible to retrieve Fasta formatted sequences:
    http://www.ebi.ac.uk/cgi-bin/dbfetch?db=EMBL&id=J00231,HSFOS,ROD894,LOP242600&format=fasta

    Because of backward compatibility issues the program can be simply called by giving one or more EMBL accession numbers or entry names:
    http://www.ebi.ac.uk/cgi-bin/dbfetch?J00231






















spacer
spacer