Skip Navigation Bar

Statistical Reports on MEDLINE®/PubMed® Baseline Data

Annual statistical reports based upon the data elements in the baseline1 versions of MEDLINE®/PubMed® are available. For each year covered the reports include: total citations containing each element; total occurrences of each element; minimum/average/maximum occurrences of each element in a record; minimum/average/maximum length of a single element occurrence; average record size; and other statistical data describing the content and size of the elements2.

Year Baseline Including OLDMEDLINE Records Baseline Excluding OLDMEDLINE Records Total Records for Each Publication Year Notes
2012 HTML
Excel (244 KB, includes auxiliary 3 reports)
Excel (56 KB)
Excel (38KB)
2011 HTML
Excel (248 KB, includes auxiliary 3 reports)
Excel (44 KB)
Excel (26KB)
2010 HTML
Excel (232 KB, includes auxiliary3 reports)
Excel (48 KB)
Excel (33KB)
2009 HTML
Excel (198 KB, includes auxiliary 3 reports)
Excel (54 KB)
Excel (24KB)
2008 HTML
Excel (186 KB, includes auxiliary3 reports)
Excel (39 KB)
Excel (35KB)
2007 HTML
Excel (176 KB, includes auxiliary3 reports)
Excel (34 KB)
Excel (16KB)
2006 HTML
Excel (143 KB, includes auxiliary reports)
Excel (29 KB)
Excel (15 KB)
2005 HTML
Excel (134 KB, includes auxiliary reports)
Excel (28 KB)
Excel (15 KB) First year complete baseline includes OLDMEDLINE records4
2004 (not available)

Excel (107 KB, includes auxiliary reports)
(64 KB, includes auxiliary reports)

Excel (14 KB)
PDF (3 KB)

First year auxiliary reports are available.

2003 (not available) Excel (27 KB)
(9 KB)
(not available)  
2002 (not available) Excel (27 KB)
(9 KB)
(not available)  



See the alphabetical list of the XML elements in the baseline database, and element documentation, including definitions of the MedlineCitation Status attribute values.


1 The baseline databases are produced at the conclusion of the MEDLINE production year and after NLM performed its annual data maintenance, primarily to apply the new MeSH Vocabulary to the MEDLINE records. Thus the statistics for 2010, for example, refer to the static baseline database at the beginning of 2010, and do not cover records that are new, revised, or deleted during the 2010 production year.

Beginning in 2005, the baseline databases include records in the three completed MedlineCitation statuses: MEDLINE, PubMed-not-MEDLINE, and OLDMEDLINE. Baseline databases do not include the incomplete in-process status or in-data-review status records that NLM also distributes to MEDLINE licensees; nor do they include the publisher status records that reside in PubMed which are not distributed to licensees.

2 The various averages in all reports were computed via the following: Total number of Occurrences for an element / Total number of Citations that included the element.

3 Auxiliary reports contain statistics for ranges of publication years, based on the PubDate element.

4 Prior to 2005, records in the OLDMEDLINE subset were not included in the baseline database; thus only one version of baseline reports exists for those years. Beginning in 2005, the reports excluding OLDMEDLINE records are generated to make possible comparisons with the previous baseline statistical reports.

Effective with the 2007 baseline, most OLDMEDLINE records formerly in MedlineCitation status = OLDMEDLINE were changed to MedlineCitation status = MEDLINE because they were assigned current MeSH Headings.

Viewers for Microsoft® Office Products can be downloaded from Microsoft.
PDF documents require the use of the Adobe® Acrobat® Reader, which can be downloaded from Adobe.

These statistical reports were compiled by the Lister Hill National Center for Biomedical Communications of the Library, are freely available for use, and may be reproduced with attribution to the U.S. National Library of Medicine, Department of Health and Human Services using this citation: Statistical reports on MEDLINE®/PubMed® baseline data [Internet]. Bethesda (MD): National Library of Medicine (US), Bibliographic Services Division. Available from: