Skip to Content
United States National Library of Medicine National Institutes of Health

MEDLINE®/PubMed® Data Maintenance Overview

Background
NLM performs two general types of maintenance on MEDLINE records during each year:

  1. Individual maintenance takes place on individual records, for example, to identify citations as being retracted, add commentary, or correct data entry errors.
  2. Global maintenance is performed to make the same type of change to large numbers of records, for example:
    1. Maintenance on elements such as <MedlineTA>, <Title>, and <ISSN> when changes are made to their corresponding source serial records
    2. Updates to the <NameOfSubstance> element as the corresponding record in the MeSH® vocabulary is created or edited
    3. End-of-year changes in the MeSH® vocabulary from one heading to another.

It is possible that large quantities of maintained records may appear in update files from time to time as NLM is able to maintain large groups of data requiring maintenance during the year. This makes it particularly important to process revised records that are distributed in daily update files.

The most current date that a record was revised, whether for global or individual maintenance, often resides in the <DateRevised> element. It is possible, however, for large numbers of records to be maintained and not have an initial or updated <DateRevised> element. Do not depend on initial presence of <DateRevised> or change to an existing <DateRevised> value to indicate that a record has been maintained.

Read the _stats.txt and _notes.txt files on the server or check the MEDLINE/PubMed Update Chart to see breakdowns of various categories of records in each file and other information.

The new baseline database produced each year contains all records in MEDLINE, PubMed-not-MEDLINE, and OLDMEDLINE <MedlineCitation> statuses (those that have and have not been maintained during and at the end of the previous production year).

Maintained and Deleted Records
New records are distributed in update files. Records that are maintained and MedlineCitation PMIDs of records that are deleted during the production year are also included in the daily update files. NLM urges licensees to process these records so their version of MEDLINE can be as current as possible. The current License Agreement requires licensees to add new records at least quarterly and apply maintained and deleted records at least annually.

Points to consider regarding update files:

  1. Update files should be applied after the baseline files and processed in ascending numeric order by filename.
  2. Records in MEDLINE, PubMed-not-MEDLINE, and OLDMEDLINE statuses are considered to be completed records and thus contain the DateCompleted element. Completed records that are subsequently revised also contain the <DateRevised> element.
  3. In-Data-Review and In-process status records are not in a completed status, thus do not contain the <DateCompleted> or <DateRevised> elements.
  4. Licensees should compare <MedlineCitation> PMIDs in update files with those in records previously loaded. If there is no match, the record is new. If there is a match, the record is either a completed record that has been revised, or the record has changed its <MedlineCitation> Status; e.g., been elevated from In-Data-Review status to In-process status or from In-Process status to MEDLINE or PubMed-not-MEDLINE status.
  5. Replace records with <DateRevised> only if that date is later than that on your existing record; this will be a concern only if files are processed out of ascending numeric order. <DateRevised> element is not used to indicate a change in <MedlineCitation> Status. It is possible for large numbers of records to be maintained and not have an initial or updated <DateRevised> element. Do not depend on initial presence of <DateRevised> or change to an existing <DateRevised> value to indicate that a record has been maintained.
  6. A record may contain more than one <PMID>. The highest level PMID immediately following <MedlineCitation> is the unique number identifying the record. Do not confuse it with the <PMID> element that resides in the <CommentsCorrections> group of elements which reference, for example, a citation that is associated with (e.g., corrects or retracts) the record in hand.
  7. DeleteCitationSet is created only if there are PMIDs to delete.

Last updated: 21 November 2007
First published: 28 January 2004
Metadata| Permanence level: Permanence Not Guaranteed