U.S. National Library of Medicine Logo MEDLINE®/PubMed® Baseline Repository
(MBR)


Warning:
Requires NN-6/higher,
IE-4/higher,
or FF-1.0.1/higher


Home


   MBR Query Tool   
(Restricted)


MBR Files


MBR Reference
Material


MEDLINE/PubMed
Baseline_Statistical
Reports
     
NOTICE - May 9, 2007: MBR Query Tool Limitation

Due to the recent internal move of our MBR Query Tool server, we are not able to provide outside access to the "Full Citations" option for receiving results. You can still run queries and receive a PMID list as a result. We are working to correct this problem and apologize for any inconvenience.


Please Note

The records included in the MEDLINE/PubMed Baseline databases represent a static view of the data at the time each baseline database was created.

To access the MeSH files under the MBR Files link, you must enter into an online Memorandum of Understanding for use of the MeSH Vocabulary data.

To access the MBR Query Tool, you must be a recognized licensee of NLM Data. The License agreement requires those who use the MEDLINE/PubMed database to fill-out an Intended Use Worksheet once a year and to file a brief Usage Report (Note: research use licensees do not have to submit the Usage Report form) to summarize their use of the database. NLM leases MEDLINE/PubMed to U.S. individuals or organizations; to its formally recognized International MEDLARS Centers; and to non U.S. individuals or organizations for internal research projects with no commercial citation search service. We update the authorization information daily at 1800 Eastern Standard Time.

We monitor and provide support for this site from 0800 to 1700 (Eastern Standard Time) Monday to Friday, excluding holidays. If you experience any problems while using the MBR web site, or would like additional information about leasing NLM data (other than the UMLS Metathesaurus), contact Jane Rosov, MEDLARS Management Section, National Library of Medicine, 8600 Rockville Pike, Bethesda, Maryland 20894, or e-mail NLMdatadistrib@nlm.nih.gov.

Researchers have requested the ability to have available MEDLINE citations in the state they were at a given moment in time without the MeSH vocabulary updates and other revisions that occur during the year. The MEDLINE/PubMed Baseline Repository was setup to provide this capability. We have stored the end of year baseline of the MEDLINE/PubMed database for each year starting in 2002 along with a selection of the associated MeSH Vocabulary data files.

BaselineCreatedNumber of Citations
2002 Approximately November 21, 2001 11,299,108
2003 Between November 1-4, 2002 11,847,524
2004 Between November 14-18, 2003 12,421,396
2005 November 20, 2004 14,792,864
2006 November 18 & 19, 2005 15,433,668
2007 November 17 & 18, 2006 16,120,074
2008 November 16 & 17, 2007 16,880,015
List of Available Baselines

The baselines are normally generated towards the middle of November each year and contain all completed citations in MEDLINE as of that date. The baselines represent MEDLINE after the year-end processing has been completed. This means that the records have been revised with the upcoming year's new MeSH vocabulary terms. We currently have available the 2002 - 2008 MEDLINE/PubMed Baselines. The naming of the baselines represents this year-end processing. For example, the 2002 MEDLINE/PubMed Baseline contains all completed citations from the mid-1960's until the date the baseline was created in late November 2001 with the year-end processing assigning appropriate 2002 MeSH vocabulary terms, thus it is a baseline for the 2002 year.

The baselines contain citations that are not MEDLINE as well. All of the baselines we have stored (2002 on) contain "Out-of-scope" citations which were renamed to "PubMed-not-MEDLINE" starting with the 2004 MEDLINE/PubMed Baseline. The PubMed-not-MEDLINE status refers to citations that reside in PubMed from journals included in MEDLINE and have undergone quality review but are not assigned MeSH headings because the cited item is not in scope for MEDLINE either by topic or by date of publication. Citations in the Out-of-scope or PubMed-not-MEDLINE status make up a very small percentage (0.51% or 75,271 records in the 2005 baseline) of the total number of citations contained in the baselines.

Starting with the 2005 MEDLINE/PubMed Baseline, OLDMEDLINE citations are also included in the baselines. The OLDMEDLINE citations make up approximately 11% of the total number of baseline citations. The OLDMEDLINE citations are from international biomedical journals covering the fields of medicine, preclinical sciences, and allied health sciences. The citations were originally printed in hardcopy indexes published prior to 1966. For additional information, please refer to the following URL: http://www.nlm.nih.gov/databases/databases_oldmedline.html.

In the 2005 baseline the subject indexing from the OLDMEDLINE citations were stored solely in the "Other Term" (or "OT") tagged fields and not the MeSH Terms (or MH) tagged fields. This means that searching the 2005 baseline from our MBR Query Tool via the MH field does not include any OLDMEDLINE citations. The only way to include OLDMEDLINE records in the 2005 baseline is to do a timeframe query without specifying any field specific search criteria. Beginning with the 2006 baseline, Other Terms are starting to be mapped to current MeSH Terms so that searching via the MH field may retrieve some OLDMEDLINE records, but, not necessarily the complete set of possibilities.

Starting with the 2007 MEDLINE/PubMed Baseline, the citation status notation of OLDMEDLINE is going away. As the OLDMEDLINE terms are converted to MeSH Headings, the status will change to MEDLINE. You need to rely on the <CitationSubset> element in the XML files and the "SB" field in the MEDLINE ASCII files to determine whether a citation is OLDMEDLINE or not. For example,

PMID: 14771459
In the XML file: <CitationSubset>OM</CitationSubset>
In the MEDLINE ASCII file: SB - OM
OLDMEDLINE Determination Example


We provide the following resources for each of the baselines for research purposes. Please note that background information on some of these resources is available from our MBR Reference Material page.

Resource Restrictions Where to Find
MBR Query Tool Database: Baseline databases 2002 forward available for searching. Includes tables with MH, SH, MH/SH combination, Chemicals, and PMID data; also can limit or filter by Date Created, Date Completed, Date Last Revised, Publication Year, and Status. License Required MBR_Query_Tool
XML Formatted Citations: XML version of baseline citations. This is the format used to export the Medline/PubMed Baseline citations. License Required MBR_Query_Tool
MEDLINE ASCII Display Formatted Citations: Each XML citation translated to MEDLINE ASCII display format used in PubMed. License Required MBR_Query_Tool
DTD Files: We save a copy of the relevant DTD (Document Type Definition) files each year for working with the Baseline XML files. No Restrictions MBR_Files
Frequency Count Files: Basic frequency counts for the entire MEDLINE/PubMed Baseline sorted into alphabetical and numerical order for the following MEDLINE fields. For all fields but the NM field, we also provide a sort and count of their occurrences as starred (Index Medicus) items.
     a. MH (MeSH Headings)
     b. SH (MeSH Subheadings)
     c. MH/SH combinations
     d. NM (Chemicals)
No Restrictions MBR_Files
Raw Data Files: Files containing the raw data similar to what was used to create our MBR Query Tool Database for this Baseline year. There is a README file describing the various files available and their layouts. No Restrictions MBR_Files
Histogram/Summary Files: File showing the number of MH terms assigned to each of the various MeSH Tree top-level and top-level + 1 categories during the latest year to see how assignment of terms might vary from year to year.

File showing the number of MH terms assigned to each of the UMLS Semantic Type Groupings categories during the latest year to see how assignment of terms might vary from year to year from a different perspective.
No Restrictions MBR_Files
Related MeSH Files: We save a copy of selected MeSH Vocabulary data files for each year and a copy of their associated DTD (Document Type Definition) files for working with the Baseline XML files. Memorandum of Understanding required MBR_Files
UMLS Semantic Groups File: We have saved a copy of the Semantic Groups file. The Semantic Groups are a coarse-grained set of semantic type groupings designed to reduce the complexity in the UMLS Metathesaurus. The 15 semantic groups provide a partition of the UMLS Metathesaurus for 99.5% of the concepts. No Restrictions MBR_Files

Last Modified: February 21, 2008 ii-public
Links to Our Sites
MetaMap Public Release
NEW: Distributable version of the actual MetaMap program.
Indexing Initiative (II)
Investigating computer-assisted and fully automatic methodologies for indexing biomedical text. Includes the NLM Medical Text Indexer (MTI).
Semantic Knowledge Representation (SKR)
Develop programs to provide usable semantic representation of biomedical text. Includes the MetaMap and SemRep programs.
MetaMap Transfer (MMTx)
Java-Based distributable version of the MetaMap program.
Word Sense Disambiguation (WSD)
Test collection of manually curated MetaMap ambiguity resolution in support of word sense disambiguation research.
Medline Baseline Repository (MBR)
Static MEDLINE Baselines for use in research involving biomedical citations. Allows for query searches and test collection creation.
Picture of Lister Hill Center Lister Hill National Center for Biomedical Communications   NLM Logo U.S. National Library of Medicine   NIH Logo National Institutes of Health
DHHS Logo Department of Health and Human Services
     Contact Us    |   Copyright    |   Privacy    |   Accessibility    |   Freedom of Information Act    |   USA.gov