Entrez-Structure logo
PubMed BLAST OMIM Taxonomy Structure
  Search Entrez  for

MMDB Home

NCBI's Structure Database

MMDB Help

Short Summary

Linking to MMDB

Direct WWW access to the MMDB server

Read About MMDB

Papers About MMDB


PDBeast

Taxonomy in MMDB

Cn3D v4.1

3D-Structure Viewer

VAST

Structure Comparisons

VAST Search

Submit Structure Database Searches

CDD

Conserved Domain Database

Research

Research Topics and the Structure Staff


"Entrez Structure and 3D-Domains Index" Help

Content



Entrez Structure and 3D-Domains Index Overview

The underlying database to support Entrez structure and 3D domains indexing is MMDB database. MMDB (a Molecular Modeling Database) is Entrez's macromolecular 3D Structure database. It contains experimentally determined biopolymer structures obtained from the Protein Data Bank (PDB). Entrez leads to vast information on biological function and molecular evolution by providing a number of powerful search tools useful for identifying structures of interest.

Entrez's Structure Index page is the "home page" for querying 3D structures. For each query, it lists the PDB names and the general descriptions of the structures returned in the query result set. It also provides a number of links leading to further information. On the right hand side of the structure index page, for each returned structure entry, there is an "MMDB" button linking to the MMDB Structure Summary page, and a "Links" menu linking to Entrez 3D Domains Index, to Entrez Protein or Nucleotide Index, to Entrez PubMed Citations Index, or to Entrez Taxonomy Index, etc., where appropriate/applicable.

Entrez's 3D-Domains Index page is the "home page" for querying 3D domains. For each query, it lists the domain names and the general descriptions of the corresponding structures of the 3D domains returned in the query result set. It also provides a number of links leading to further information. On the right hand side of the 3D domains index page, for each returned 3D domain entry, there is an "MMDB" button linking to the MMDB Structure Summary page, a "VAST" button linking to the VAST 3D-Domain Neighbors Summary page, and a "Links" menu linking to Entrez Structure Index, to Entrez Protein or Nucleotide Index, to Entrez PubMed Citations Index, or to Entrez Taxonomy Index, etc., where appropriate/applicable.


How To Make a Query

Go to the entrez page: http://www.ncbi.nlm.nih.gov/entrez

For Structure or 3D-Domains queries, choose the "Search Structure" or "Search 3D Domains" menu item at the top left of the page.

There are four types of queries: (1) string query, (2) integer query, (3) date query, and (4) range query. For all these queries, write down in the text box the token to be queried followed by a field alias in square brackets. All queries are case-insensitive.

By default, a string query without a field alias means query against [ALL]; an integer query without a field alias means query against [UID].

Date queries must have the following formats: YYYY/MM/DD, YYYY/MM/D (single digit month and day are allowed and don't have to be pre-padded by 0), YYYY/M/D, YYYY/MM, or YYYY, etc.

Range queries are constructed by two tokens (a from and a to) separated by a : (colon) to specify the range, followed by a field alias in square brackets. All dates and all 'counts' (like resiude counts, helix counts, etc.) fields can be range queried. Apart from that, there are two additional fields that can be range queried: Resolution [RESO] in Structure Index and MolWeight [MWT] in 3D-Domains Index.

Range queries on Resolutions [RESO] (in angstroms) must have the following format:
fromResolution:toResolution [RESO].

Range queries on MoleculeWeights [MWT] (in daltons) must have the following format:
fromMoleculeWeight:toMoleculeWeight [MWT].

Range queries on Dates has similar format:
FromDate:ToDate [field-alias] (FromDate and ToDate are of form: YYYY/MM/DD, YYYY/MM, YYYY/M, YYYY, etc)

Range queries on 'counts' has format:
FromCount:ToCount [field-alias] (FromCount and ToCount are integers)

Special Notes on Querying PdbChainCode [CHN]: PDB chain code can be a wide variety of characters including white-spaces (which can not be queried in Entrez). Also, PDB chain code is case-sensitive, whereas Entrez search engine is case-insensitive. In order to facilitate queries on special characters such as white-spaces, PdbChainCode queries can be done by inputting either the character itself or its corresponding decimal ASCII code. For example, in order to query PDB chain code 'A', you can input either 'A [CHN]' (which is interpreted by Entrez search engine to mean either 'a' or 'A') or '65 [CHN]' (to mean unambiguously the upper-case 'A' only); on the other hand, to query a white-space PDB chain code, you have only one choice: to input '32 [CHN]' (see examples below). In order to avoid upper/lower case ambiguity, it's recommended that you input PdbChainCode queries with decimal ASCII code.

The following sections on Structure Query Capabilities and on 3D-Domains Query Capabilities list all the field aliases for different fields. One field may have several aliases, in which case, use the one you find most easily memorizable.


Query Examples

For queries on Entrez Structure:

tyrosine kinase
nmr structure [TITL]
1b3o [ACCN]
19741
3.2.1.17 [EC]
3.2.1.- [EC]
3.2.*.* [EC]
5:7 [PCC]
A [CHN]
65 [CHN]
32 [CHN]
1994/01/23:1994/03/23[PDDAT]
1.97:2.14 [RESO]
SO4 [LCOD]
benzamidine [LNAM]
beta-mercaptoethanol [LDES]
isomerase [PCLS]
lysozyme [PCOM]
African Clawed Frog [PSRC]

For queries on Entrez 3D-Domains:

1b3oa1 [NAME]
2001/7/24:2003/02/16 [PRD]
2001:2003/2 [PRD]
fission yeast [ORGN]
pap [PDES]
A [CHN]
65 [CHN]
32 [CHN]
23:32 [RC]
3:7 [HC]
11:29 [MPRC]
11369.52:13521.06 [MWT]


Structure Query Result Entries

The structure result entries include the following:
  • PdbAcc (aliases: [ACCN, PACC, PDBACC]): The four-character identifier assigned by PDB to specify the PDB structure. Clicking on it will go to the corresponding MMDB Structure Summary page.
  • PdbDescr (aliases: [PDSC, PDES]): A brief description of the PDB structure.
  • Uid (aliases: [UID, ID, MMDBID]): The integer assigned by MMDB to uniquely specify the PDB structure.


Structure Query Capabilities

The following fields can be queried in entrez structure index (with field aliases in square brackets; pick one alias that's easily memorized in case multiple aliases are available):
  • All [ALL]: All of the following fields are searched. If a string query is presented without a field alias, by default, [ALL] is searched.
  • Uid [UID, ID, MMDBID]: The integer assigned by MMDB to uniquely specify the PDB structure. If an integer query is presented without a field alias, by default, [UID] is searched. For structures, the UIDs are MMDB IDs.
  • PdbAcc [ACCN, PACC, PDBACC]: The four-character identifier assigned by PDB to specify the PDB structure.
  • EC [EC]: The EC number of the PDB structure. This field can be queried with wild-card feature:
    3.2.1.- [EC]
    3.2.*.* [EC]
    3.2.* [EC]
    and so on. Note the queries 3.2.*.* [EC] and 3.2.* [EC] will return identical set of PDB structures and hence these two queries are equivalent.
  • Resolution [RESO, RESL, RES]: The resolution (in angstroms) of a protein structure. This field can be range queried with the above specified format.
  • ExpMethod [EXPM, EXP]: The experimental method used (X-Ray, NMR, etc.) to characterize the protein structure.
  • Title [TITL, TITLE]: The title of the publication that reported the PDB structure findings.
  • Author [AUTH, AU]: The author of the publication that reported the PDB structure findings.
  • Journal [JOUR, JOURNAL]: The journal of the publication that reported the PDB structure findings.
  • MMDBEntryDate [DDAT]: The date on which a Protein Data Bank structure record was first imported into MMDB. This field can be range queried with the above specified format.
  • MMDBModifyDate* [MDAT]: The date on which the latest version of a Protein Data Bank structure record was imported into MMDB. The MMDBModifyDate roughly corresponds to PDB's Release Date, as MMDB mirrors PDB content in a timely fashion. This field can be range queried with the above specified format. (See also the footnote* about this field, below.)
  • PDBDepositDate [PDDAT]: The earliest date that Protein Data Bank associates with an accession, generally representing the date on which the record was submitted to the PDB. This field can be range queried with the above specified format.
  • PdbClass [PCLA, PCLS]: The classification of the PDB structure.
  • PdbSource [PSRC, PSOU]: The sample source of the PDB structure.
  • PdbDescr [PDSC, PDES]: The brief description of the PDB structure.
  • PdbComment [PCOM, PCMT]: The more detailed description of the PDB structure.
  • Organism [ORGN]: The organism and lineage of the PDB structure.
  • PdbChainCode [CHN, CHNC, CCODE]: The 1-letter PDB chain code.
  • LigCode [LCOD, LIGC, LCODE]: The 3-letter code of a ligand in the PDB structure.
  • LigName [LNAM, LIGN, LNAME]: The PDB definition of a ligand in the PDB structure.
  • LigDescr [LDES, LIGD, LDSC, LDESC]: The author's brief description of a ligand in the PDB structure.
  • LigCount [LCOU, LCNT, LCOUNT]: The number of different types of ligands (not the number of ligands) in the PDB structure. This field can be range queried with the above specified format.
  • ModProteinResCount [MPRC, MPRCNT, MPRCOUNT]: The number of modified protein residues in the PDB structure. This field can be range queried with the above specified format.
  • ModDNAResCount [MDRC, MDRCNT, MDRCOUNT]: The number of modified DNA residues in the PDB structure. This field can be range queried with the above specified format.
  • ModRNAResCount [MRRC, MRRCNT, MRRCOUNT]: The number of modified RNA residues in the PDB structure. This field can be range queried with the above specified format.
  • ProteinChainCount [PCC, PCCNT, PCCOUNT]: The number of protein chains in the PDB structure. This field can be range queried with the above specified format.
  • DNAChainCount [DCC, DCCNT, DCCOUNT]: The number of DNA chains in the PDB structure. This field can be range queried with the above specified format.
  • RNAChainCount [RCC, RCCNT, RCCOUNT]: The number of RNA chains in the PDB structure. This field can be range queried with the above specified format.

*Note about the MMDBModifyDate field: When PDB undergoes a database remediation, in which most or all PDB records are updated in some way, MMDB imports the complete set of updated records. This was the case when the PDB database underwent a 2007 remediation. Because the complete revised PDB data set was loaded into MMDB at that time, the earliest available value in the MMDBModifyDate field is 2007.

Retrieve protein structures bound to specific chemicals or small biopolymers

It is possible to retrieve protein structures that are bound to a specific chemical or a small biopolymer by first searching the PubChem Compound database for the desired molecule, then following the link from the PubChem Compound record(s) of interest to Protein Structures, as shown for example 1 (aspirin) and example 2 (a peptide) below. PubChem Compound records can be retrieved by text term search (e.g., aspirin), unique identifier (e.g., the PubChem Compound identifier, or CID, for aspirin is 2244), and more. The PubChem help document provides details about searching that database.

Alternatively, the PubChem Structure Search page can be used to search for PubChem compounds by identity/similarity, substructure/superstructure, or molecular formula. This can be helpful, for example, when searching with peptides that contain modified residues. The Filters/Data Source/From MMDB option on the PubChem Structure Search page can be used to limit retrieval to PubChem compounds that originated from the Molecular Modeling Database (MMDB). That way, all PubChem compounds retrieved will also appear in corresponding 3D protein structure records, where they can be viewed in 3D bound to a protein structure, using the Cn3D viewing program. Example 3, below, shows how to retrieve protein structures bound to aspirin or similar compounds, such as salicylic acid.

Example 1: Retrieve protein structures bound to aspirin (CID 2244)

  1. open the PubChem Compound search page
  2. in the query box, enter the unique identifer 2244 or the term aspirin (the latter will retrieve CID 2244 as well as other compounds that contain the term aspirin in their records) and press GO
  3. on the search results page, select the Other Links/Protein Structures link for the compound of interest, in this case for CID 2244.
    (Alternatively, on the search results page you can first click on CID 2244 to open the complete PubChem record for that compound and then select Links: Protein Structure from the right hand margin of the PubChem record.)
  4. a list of protein structure records from MMDB that contain your compound(s) of interest will appear
  5. click on the accession number of any record of interest to view its summary information.
  6. On the structure summary page, press the Structure View in Cn3D button to open an interactive view of the 3D protein structure and its bound chemical. (The Cn3D program must first be installed on your computer in order for that button to work. The program is free and installation only takes a minute or two.)
Example 2: Retrieve protein structures bound to a small biopolymer, for example a peptide (CID 5496545)
  1. open the PubChem Compound search page
  2. in the query box, enter the unique identifer 5496545 and press GO
  3. on the search results page, select the Other Links/Protein Structures link for that compound
    (Alternatively, on the search results page you can first select Related Structures/Similar Compounds to expand your retrieval to other peptides that are similar to 5496545. Then select Other Links: Protein Structures from the grey area at the top of the broader PubChem search results page. That will retrieve protein structures that are bound to any of the compounds listed on the page (default) or those you have selected with checkboxes.
  4. a list of protein structure records from MMDB that contain your compound of interest will appear
  5. click on the accession number of any record of interest to view its summary information.
  6. On the structure summary page, press the Structure View in Cn3D button to open an interactive view of the 3D protein structure and its bound chemical. (The Cn3D program must first be installed on your computer in order for that button to work. The program is free and installation only takes a minute or two.)
Example 3: Use the PubChem Structure Search to retrieve protein structures bound to aspirin (CID 2244) or similar compounds
  1. open the PubChem Structure Search page
  2. select Search by: Identity/Similarity
  3. select the option for searching by CID, SMILES, InChI
  4. enter the CID 2244 for aspirin
  5. under Options, select the desired level of similarity for the structures you want to retrieve. Identical structures are shown as the default selection, but if you want to retrieve structures such as salicylic acid, for example, change the option to Similar Compounds, score>=80%. (The PubChem help document provides details about identical structures and similar compounds.)
  6. under Filters: Data Source, select From: MMDB
  7. scroll to the top of the search page and press the Search button
  8. the Search results page will show the query compound and similar compounds, including, for example, CID 338: salicylic acid.
  9. select Other Links: Protein Structures from the grey area at the top of the page. That will retrieve protein structures that are bound to any of the compounds listed on the page (default) or those you have selected with checkboxes.
  10. once you are viewing the list of protein structure records in the Entrez Structure (MMDB) database, click on the accession number of any record of interest to view its summary information.
  11. On the structure summary page, press the Structure View in Cn3D button to open an interactive view of the 3D protein structure and its bound chemical. (The Cn3D program must first be installed on your computer in order for that button to work. The program is free and installation only takes a minute or two.)



What are "3D-Domains"?

3D-Domains within individual polypeptide chains in MMDB are identified automatically, using an algorithm that searches for one or more breakpoints, falling between major secondary structure elements, such that the ratio of intra- to inter-domain contacts falls above a set threshold. The 3D-Domains identified in this way provide means to increase the sensitivity of structure neighbor calculations, and to present 3D superpositions based on compact domains as well as on complete polypeptide chains. They are not intended to represent domains identified by comparative sequence and structure analysis, as modules that recur in related proteins, though there is often good agreement between domain boundaries identified by these methods.

The structure similarities among individual chains and their compact 3D-Domains in MMDB are calculated by VAST algorithm, which superposes structures based on the structure alignments of their secondary structure elements.


3D-Domains Query Result Entries

The 3D-domains result entries include the following:
  • Name (alias: [NAME]): The name of the 3D domain. It is a string concatenated by the four-character PDB code of the structure, followed by a one-letter chain code assigned by PDB, and then by an integer domain number on the given chain. Clicking on it will go to the corresponding MMDB Structure Summary page.
  • PdbDescr (aliases: [PDSC, PDES]): The brief description of the PDB structure that has the specified 3D domain.
  • Uid (aliases: [UID, ID, SDI]): The integer assigned by MMDB to uniquely specify the 3D domain. Clicking on it will go to the corresponding VAST 3D-Domain Neighbors Summary page.


3D-Domains Query Capabilities

The following fields can be queried in entrez 3D domains index (with field aliases in square brackets; pick one alias that's easily memorized in case multiple aliases are available):
  • All [ALL]: All of the following fields are searched. If a string query is presented without a field alias, by default, [ALL] is searched.
  • Uid [UID, ID, SDI]: The integer assigned by MMDB to uniquely specify the 3D domain. If an integer query is presented without a field alias, by default, [UID] is searched. For 3D domains, the UIDs are SDIs (structure domain identifiers).
  • MmdbId [MID, MMDB, MMDBID]: The integer assigned by MMDB to uniquely specify the PDB structure.
  • DomainName [NAME]: The name of the 3D domain. It is a string concatenated by the four-character PDB code of the structure, followed by a one-letter chain code assigned by PDB, and then by an integer domain number on the given chain.
  • PdbAcc [ACCN, PACC, PDBACC]: The four-character identifier assigned by PDB to specify the PDB structure.
  • Title [TITL, TITLE]: The title of the publication that reported the PDB structure findings.
  • Author [AUTH, AU]: The author of the publication that reported the PDB structure findings.
  • MMDBEntryDate [DDAT]: The date on which a Protein Data Bank structure record was first imported into MMDB. This field can be range queried with the above specified format.
  • MMDBModifyDate* [MDAT]: The date on which the latest version of a Protein Data Bank structure record was imported into MMDB. The MMDBModifyDate roughly corresponds to PDB's Release Date, as MMDB mirrors PDB content in a timely fashion. This field can be range queried with the above specified format. (See also the footnote* about this field, below.)
  • PDBDepositDate [PDDAT]: The earliest date that Protein Data Bank associates with an accession from which a 3D domain record was generated. This generally represents the date on which the original structure record was submitted to the PDB. This field can be range queried with the above specified format.
  • PdbChainCode [CHN, CHNC, CCODE]: The 1-letter PDB chain code.
  • PdbClass [PCLA, PCLS]: The classification of the PDB structure.
  • PdbSource [PSRC, PSOU]: The sample source of the PDB structure.
  • PdbDescr [PDSC, PDES]: The brief description of the PDB structure.
  • PdbComment [PCOM, PCMT]: The more detailed description of the PDB structure.
  • Organism [ORGN]: The brief description of organism and lineage of the PDB structure.
  • DomainNo [DN, DNUM]: The domain number (across a given chain on the PDB structure) of the 3D domain.
  • CumulDomainNo [CDN, CDNM, CNUM]: The cumulative domain number (across all chains on the PDB structure) of the 3D domain.
  • ModProteinResCount [MPRC, MPRCNT, MPRCOUNT]: The number of modified protein residues in the PDB structure. This field can be range queried with the above specified format.
  • HelixCount [HC, HCNT, HCOUNT]: The number of alpha-helices on the 3D domain. This field can be range queried with the above specified format.
  • StrandCount [SC, SCNT, SCOUNT]: The number of beta-strands on the 3D domain. This field can be range queried with the above specified format.
  • ResCount [RC, RCNT, RCOUNT]: The number of residues in the 3D domain. This field can be range queried with the above specified format.
  • MolWeight [MWT]: The molecular weight (in daltons) of the 3D domain. This field can be range queried with the above specified format.

*Note about the MMDBModifyDate field: When PDB undergoes a database remediation, in which most or all PDB records are updated in some way, MMDB imports the complete set of updated records. This was the case when the PDB database underwent a 2007 remediation. Because the complete revised PDB data set was loaded into MMDB at that time, the earliest available value in the MMDBModifyDate field is 2007.


Updated

16 June 2009


Privacy statement

Disclaimer

 
Help Desk NCBI NLM NIH Credits