United States National Library of Medicine National Institutes of Health

MEDLINE®PubMed® XML Element Descriptions and their Attributes

This document currently reflects the DTDs used for the 2008 MEDLINE/PubMed production year. Note that the <PubmedData> group of elements that reside in PubMed are not defined in these DTDs and thus those elements are not distributed to NLM licensees.

The use of "Medline" in a DTD or element name does not mean the record represents a citation from a Medline-selected journal. When the NLM DTDs and XML elements were first created, MEDLINE records were the only data exported. Now NLM exports citations other than MEDLINE records using these tools. To minimize unnecessary disruption to users of the data and tools, NLM has retained the original DTD and element names (e.g., NLMMedline DTD, MedlineTA, MedlineJournalInfo).

THE ELEMENTS AND THEIR ATTRIBUTES IN MEDLINECITATIONSET

[Last Modified: September 8, 2008]
The elements and their attributes are presented in this section in the order they appear in the distributed record, per the DTD. Also available:
* an alphabetical list of all element names
* information on the elements used at NLM to create the journal source
* a list of required elements

Policies affecting data creation have evolved over the years. In addition, some MEDLINE/PubMed records are added or revised well after the cited article was first published. In these cases, on occasion an element that had not yet been created when the article was published may appear on the records. For example, some records published before the mid-1970s but added to MEDLINE/PubMed after 1975 contain <Abstract> (that element was not created until 1975). It is also possible that an element may be treated differently from the way it would have been treated had the record been created or maintained near the time the article was published. For example, the number of <Author> occurrences can diverge from the policies stated in <AuthorList>.

The "Additional information/background" portion at the end of some element descriptions contains historical information or details about the treatment of the element for certain categories of records. Of particular note, details about variations found in records in the OLDMEDLINE subset (records with value OM in <CitationSubset> element) are placed in the "Additional information/background" portion of applicable element descriptions.

USE OF LISTS AND ATTRIBUTE "CompleteYN"
Three of the elements (<AuthorList>, <GrantList>, and <DataBankList>) use "lists" with the corresponding attribute of 'CompleteYN". 'Y', meaning Yes, represents that NLM has entered all list items that appear in the published journal article. 'N', meaning No, represents that NLM has not entered all list items that appear in the published journal article. The latter case (incomplete list) occurs on records created during periods of time when NLM policy was to enter fewer than all items that qualified. NLM recommends the following when encountering 'N' for the element lists:

<AuthorList> when attribute = N, then supply the literal "et al." after last author name
<GrantList> when attribute = N, then supply the literal "etc." after the last grant number
<DataBankList> when attribute = N, then supply the literal "etc." after the last occurrence

Other MEDLINE elements possibly containing multiple values use 'lists'; however, they do not use attributes to indicate completeness. For those elements, all lists are complete.

1. <MedlineCitation>
<MedlineCitation> is the top level element in MedlineCitationSet and contains one entire record. This element has two attributes: Owner and Status, as described below.

<MedlineCitation Owner="Value"> The party responsible for creating and validating the citation is recorded as the MedlineCitation Owner attribute. Each citation has only one MedlineCitation Owner and there are eight possible values for this attribute: NLM | NASA | PIP | KIE | HSR | HMD | SIS | NOTNLM. The valid Owner values for the various NLM departments and outside collaborating data partners are:

  • NLM - National Library of Medicine, Index Section (the vast majority of citations carry this value)
  • NASA - National Aeronautics and Space Administration
  • PIP - Population Information Program, Johns Hopkins School of Public Health (not a current value; only on older citations)
  • KIE - Kennedy Institute of Ethics, Georgetown University
  • HSR - National Information Center on Health Services Research and Health Care Technology, National Library of Medicine
  • HMD - History of Medicine Division, National Library of Medicine
  • SIS - Specialized Information Services Division, National Library of Medicine (not yet used; reserved for possible future use)
  • NOTNLM - for licensees' use; NLM will never use this value on citations it exports; licensees may want to use this value if they want to adapt the MEDLINE DTD for other applications. Some of the above Owner attributes - NASA, PIP, and KIE - may also be used with <GeneralNote> and <KeywordList> elements if the citation has been enriched with additional data by a collaborating partner.

<MedlineCitation Status="Value">
The Status attribute indicates the stage of a citation. There are seven possible values for the MedlineCitation Status attribute: Completed | In-Process | PubMed-not-MEDLINE | In-Data-Review | Publisher | MEDLINE | OLDMEDLINE, as described below in the order in which processing of records distributed to licensees occurs.

  1. MedlineCitation Status attribute: In-Data-Review
    Records submitted to NLM electronically by publishers are added to PubMed at NLM and distributed to licensees in In-Data-Review status. Records in this status have undergone review at the journal issue level; i.e., the journal title, date of publication and volume/issue elements (referred to as the source data) are checked. Before records are distributed in In-Data-Review status, the source data have either:

    1. been matched to the print copy in NLM's collection and are correct; or
    2. been matched to the online version of the journal (when NLM assigns MeSH headings from the online version) and are correct; or
    3. been compared to previously checked in issues and appear to match the pattern or have been changed to match the established pattern. In these cases, the physical item has not yet been received for NLM's collection and the data have not been positively verified and may still change during NLM's processing cycle.
    While all three reviews are at the issue level, most citations fit into the last condition above. It is possible that the source information may be changed at a later point in the NLM quality assurance cycle once the hard copy issue is available for exact comparison.

    In-Data-Review records lack the <DateCompleted> element. They are not yet MEDLINE records because they have not undergone complete quality review and MeSH indexing; thus they should not be identified as MEDLINE records in licensees' systems/products.

    The issue level review for In-Data-Review status is the first step in quality control and the records will either be typically reissued as In-process status records or go to PubMed-not-MEDLINE final record status. The <DateRevised> element is not applied when a record moves out of In-Data-Review status.

    Records that are in In-Data-Review status at the beginning of a new production year are not distributed as part of the annual baseline files. They are distributed shortly after the baseline files with the remaining In-process records as the new production year begins.

    See additional information for retrospective records.

  2. MedlineCitation Status attribute: In-Process
    Records in this status have undergone a citation level review; i.e., the author names, article title, and pagination are checked. All In-Data-Review records that entered the workflow via publisher electronic submission are redistributed again in In-Process status whether or not they were revised as a result of the second citation level review, and are not identified in any way as having been revised or not having been revised. Licensees will simply see that this is the second time records with the same PMID, now in In-process status, are received. The In-process version of the record replaces the In-Data-Review version. This workflow means that licensees will receive many records twice: once after the review of the issue level information of electronically submitted records (i.e., the In-Data-Review status records) and again after the review of the individual citation data (In-process status records). Records created via NLM's current other data entry mechanism, scanning/optical character recognition (OCR), are distributed for the first time in In-Process status after their creation.

    In-process records lack the <DateCompleted> element; however, they do contain the <CitationSubset> element. They are not yet MEDLINE records because they have not undergone complete quality review and MeSH indexing; thus they should not be identified as MEDLINE records in licensees' systems/products.

    Most in-process records are eventually indexed with MeSH Headings and are elevated to completed MEDLINE status. However, some are determined to be out of scope (e.g., articles on plate tectonics or astrophysics from certain MEDLINE journals, primarily general science and chemistry journals, for which the life sciences articles are indexed for MEDLINE) and are not elevated to MEDLINE status; instead they become PubMed-not-Medline final status records. In rare cases the records are deleted and do not become PubMed-not-MEDLINE records. The <DateRevised> element is not applied when a record moves out of In-process status.

    Records that are in In-process status at the beginning of a new production year are not distributed as part of the annual baseline files. They are distributed shortly after the baseline files with the remaining In-Data-Review records as the new production year begins.

    See additional information for retrospective records.

  3. MedlineCitation Status attribute: MEDLINE
    In-process records undergo rigorous quality assurance routines before they are elevated to MEDLINE status or to PubMed-not-MEDLINE status.

    Records in MEDLINE status are the only 'true' MEDLINE records in the xml distribution. They contain <DateCompleted> and <CitationSubset> and, in most cases, contain <MeshHeadingList>. MEDLINE records that are Retractions of Publications (see Publication Type element) are exceptions and do not contain <MeshHeadingList>. MEDLINE records may be new or existing records that have been revised (see maintenance).

    MEDLINE status records are distributed as part of the annual baseline files along with OLDMEDLINE and PubMed-not-MEDLINE records.

  4. MedlineCitation Status attribute: OLDMEDLINE
    A small percentage of the records in the OLDMEDLINE subset (designated by <CitationSubset> value OM) are in MedlineCitation Status = OLDMEDLINE. The criterion for records to be in OLDMEDLINE status is that all the original MeSH Headings which reside in the <KeywordList> have not yet been mapped to current MeSH. It is possible, however, that one or more old Keyword terms have been mapped. For the larger number of OLDMEDLINE subset records whose <MedlineCitation> Status is MEDLINE, all old Keyword terms have been mapped to current MeSH. NLM exports both new and revised OLDMEDLINE records on an irregular and infrequent basis.

    Beginning with the 2005 baseline distribution, OLDMEDLINE status records are distributed as part of the annual baseline files along with MEDLINE and PubMed-not-MEDLINE records.

  5. MedlineCitation Status attribute: PubMed-not-MEDLINE
    Records in this status are from journals included in MEDLINE and have undergone quality review but are not assigned MeSH headings because the cited item is not in scope for MEDLINE either by topic or by date of publication, or from non-MEDLINE journals and have undergone quality review. The specific categories of non-MEDLINE records in this status are:
    1. citations to articles that precede the date a journal was selected for MEDLINE indexing and are submitted for inclusion in PubMed after July 2003;
    2. out of scope citations to articles in journals covered by MEDLINE;
    3. analytical summaries of articles published elsewhere (see the article, "Linking MEDLINE Citations to Evidence-Based Medicine Assessments and Summaries", in the May-Jun 2002 NLM Technical Bulletin, page e2); and
    4. starting in summer 2005, prospective citations to articles from non-MEDLINE journals that submit full text to PubMed Central and are thus cited in PubMed.
    NLM first began distributing records in PubMed-not-MEDLINE status at the end of July 2003 when it ceased using the old MedlineCitation Status value of Out-of-scope. Records previously distributed in the old Out-of-scope status were converted to the more generic PubMed-not-MEDLINE status and redistributed with the 2004 baseline database.

    Records in PubMed-not-MEDLINE status have most often first been distributed in In-Data-Review status prior to their quality review.

    PubMed-not-MEDLINE records contain <DateCompleted> element and lack <CitationSubset> and <MeshHeadingList> elements.

    PubMed-not-MEDLINE status records are distributed as part of the annual reload files along with MEDLINE and OLDMEDLINE records.

    See additional information for retrospective records.

  6. MedlineCitation Status attribute: Publisher
    Records in Publisher status are not distributed to licensees. At this time over 98% of PubMed's content is distributed to MEDLINE licensees. There are approximately 430,000 additional records in Publisher MedlineCitation Status in PubMed and NLM expects more of them will be exported in the future. The non-exported records contain the notation [PubMed - as supplied by publisher] or [PubMed - author manuscript in PMC] in the PubMed display.

    The records in Publisher MedlineCitation Status with the notation [PubMed - as supplied by publisher] are:

    1. the retrospective records for the relatively few non-MEDLINE journals in PubMed (Note: starting July 2005 prospective non-MEDLINE journal records in PubMed are distributed in PubMed-not-MEDLINE status);
    2. the retrospective records for MEDLINE journals prior to date of selection for MEDLINE and that were submitted electronically by the publishers before late July 2003;
    3. the prospective records for currently indexed journals when the publisher has submitted an issue's citation data electronically and NLM still awaits its print copy or access to the electronic copy to use for issue level review (i.e., the journal title, date of publication and volume/issue elements) AND the publisher-supplied record contains a validation error of some kind that prevents it from being exported from NLM's Data Creation and Maintenance System (DCMS) along with the records not containing errors from the same issue. If there were no errors, the record would move to MedlineCitation Status In-Data-Review right away and be exported. In these cases, however, NLM staff must take corrective action before the record can be elevated to In-Data-Review status for export; and
    4. citations electronically submitted for articles that appear on the Web in advance of the journal issue's release (i.e., ahead of print citations). Following publication of the completed issue, the item will be queued for issue level review and released in In-Data-Review status.

    The records in Publisher MedlineCitation Status with the notation [PubMed - author manuscript in PMC] are citations to author manuscripts per NIH's Public Access policy. Many of the scientists who receive research funding from NIH publish the results of this research in journals that are not available in NLM's PubMed Central (PMC). In order to improve access to these research articles, these authors are asked to give PMC the final, peer reviewed manuscripts of such articles once they have been accepted for publication. Citations to these author manuscripts in PMC reside in PubMed in Publisher status.

  7. MedlineCitation Status attribute: Completed
    This attribute is no longer used. Beginning with the 2005 baseline distribution, records previously distributed in Completed status are distributed in either MEDLINE or OLDMEDLINE status.

Retrospective Records
Note that retrospective records for currently indexed MEDLINE journals sent electronically by publishers after late July 2003 and covering publication dates prior to the journal's selection for MEDLINE are exported in In-Data-Review status and then again in In-process and finally PubMed-not-MEDLINE status; however, if NLM does not have a paper copy of the back issue in its collection, the records are exported in In-Data-Review MedlineCitation Status only, and they are not further processed for subsequent redistribution in In-process or PubMed-not-MEDLINE status. Back issue data for currently indexed MEDLINE titles for which NLM does have hardcopy back issues in its collection are subsequently exported in PubMed-not-MEDLINE MedlineCitation Status.

Records in MEDLINE, PubMed-not-Medline, and OLDMEDLINE status categories reside in the annual reload files distributed via ftp. Records in all but the Publisher status category may be present in update files. The update files may also contain <DeleteCitation> in which PMIDs of deleted records are found.

Back to top.


2. <PMID>
<PMID>, the PubMed (NLM's database that incorporates MEDLINE) unique identifier, is a 1 to 8-digit accession number with no leading zeros. It is present on all records exported to licensees and is the accession number for managing and disseminating records. PMIDs are not reused after records are deleted.

Examples are:

<PMID>10097079</PMID>
<PMID>6012557</PMID>
<PMID>10</PMID>

Additional information/background:
Prior to the 2004 version of MEDLINE, all records contained a <MedlineID> in addition to the <PMID>. Beginning with the 2004 baseline database first distributed in December 2003, NLM no longer exports the <MedlineID>. The <PMID> has become the single element to uniquely identify the MEDLINE record.

Back to top.

3. <DateCreated>
<DateCreated> is the date processing of the record begins.

An example is:

<DateCreated>
<Year>2002</Year>
<Month>05</Month>
<Day>16</Day>
</DateCreated>

This is contrasted with <DateCompleted> that is the date processing ends. <DateCreated> is not the same as NLM's PubMed Entrez Date (EDAT) that is the date the record entered PubMed. The PubMed Entrez Date does not reside on records distributed to licensees as it is generated when the record gets to PubMed.

Additional information/background:
For citations up to about the year 2000, the Date of Entry element in NLM's legacy ELHILL® system was used to set both <DateCreated> and <DateCompleted>.

For records in the OLDMEDLINE subset (<CitationSubset> = OM): <DateCreated> for citations converted from the 1964 and 1965 Index Medicus (IM), represents the year and month the citations were printed in the monthly Index Medicus, and the day will always be "01". All other records have a year based on the year of the printed index, the month is always "12" for December, and the day is always "01".

Back to top.

4. <DateCompleted>
<DateCompleted> is the date processing of the record ends; i.e., MeSH® Headings have been added, quality assurance validations are completed, and the completed record subsequently is distributed to PubMed and licensees. This is contrasted with <DateCreated > that is the date processing begins. In-Process records lack <DateCompleted>.

An example is:

<DateCompleted>
<Year>2002</Year>
<Month>02</Month>
<Day>07</Day>
</DateCompleted>

Additional information/background:
For records in the OLDMEDLINE subset (<CitationSubset> = OM): <DateCompleted> is the approximate date the record entered PubMed instead of the date processing ends because OLDMEDLINE records are created and processed differently than MEDLINE records.

Back to top.

5. <DateRevised>
<DateRevised> may reside on records in MedlineCitation Status = MEDLINE, MedlineCitation Status = OLDMEDLINE, and MedlineCitation Status = PubMed-not-MEDLINE. It identifies the date a change is made to a record in one of those statuses, either as a result of individual or global maintenance. There is no indication of what the change is on the record and only the latest revision date is distributed. The <DateRevised> element is not assigned as a result of a change in MedlineCitation Status; e.g., <DateRevised> is not automatically generated as a result of a record elevating from MedlineCitation Status=In-process to MedlineCitation Status=MEDLINE.

It is possible for large numbers of records to be maintained and not have an initial or updated <DateRevised> element. Do not depend on initial presence of <DateRevised> or change to an existing <DateRevised> value to indicate that a record has been maintained.

An example is:

<DateRevised>
<Year>2002</Year>
<Month>03</Month>
<Day>20</Day>
</DateRevised>

Additional information/background:
When the 10 million+ MEDLINE records through the 2000 production year were converted to XML from NLM's legacy ELHILL system, all records were assigned a <DateRevised> of 20001218 (December 18, 2000). Subsequently, many of these records have been or will be maintained, thus have or in the future will have a later <DateRevised> value

Back to top.


6. <Article>
<Article> is an 'envelop' element that contains various elements describing the article cited; e.g., article title and author name(s). It has a single attribute, PubModel, which is used to identify the medium/media in which the cited article is published. There are four possible values for PubModel: Print | Print-Electronic | Electronic | Electronic-Print.

<Article PubModel="Print"> - the journal is published in print format only
<Article PubModel="Print-Electronic"> - the journal is published in both print and electronic format
<Article PubModel="Electronic"> - the journal is published in electronic format only
<Article PubModel="Electronic-Print"> - the journal is published first in electronic format followed by print (this value is currently used for just one journal, Nucleic Acids Research).

NLM derives these values from the data submitted by the publishers. Various combinations of the <Article> PubModel attribute setting and the data in <ArticleDate> permit control of which dates display in the source area of the PubMed citation display. Click here for information on how to interpret these data to indicate print and/or electronic publication dates when creating the source.

Back to top.

7. <Journal>
This is an 'envelop' element that contains various elements describing the journal cited; i.e., ISSN, Volume, Issue, and PubDate and author name(s), however, it does not contain data itself.

Back to top.

8. <ISSN>
<ISSN> (International Standard Serial Number) is always an eight-character value that uniquely identifies the cited journal. It is nine characters long in the hyphenated form: XXXX-XXXX. It has a single attribute, ISSNType, which indicates which of the ISSNs assigned to the journal is recorded in the citation. Some journals are published online in addition to or instead of in print and a unique ISSN is assigned for each version. For journals published in both media (referred to as hybrid journals), NLM chooses one version to use for MeSH indexing and the ISSN and ISSNType for that version appears in the MEDLINE citation. The three valid values are Electronic, Print, and Undetermined, although Undetermined is not used for MEDLINE/PubMed data.

Examples are:

< ISSN IssnType ="Print "> 0950-382X </ ISSN >
< ISSN IssnType ="Electronic "> 1432-2218 </ ISSN>

Some records do not contain an <ISSN> value. See also <NlmUniqueID> , <JournalIssue> and <ISSNLinking>.

Information about journals cited in MEDLINE, including the complete title of the journal, is found in:

  1. NLM's online catalog available at LocatorPlus (http://locatorplus.gov) and NLM Catalog http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?CMD=search&DB=nlmcatalog).
  2. SERFILE, another file that may be leased from NLM (see http://www.nlm.nih.gov/databases/leased.html)
  3. PubMed journals files located at http://www.nlm.nih.gov/bsd/serfile_addedinfo.html (contains limited journal information; updated daily)
  4. Entrez Journals database at http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=journals for basic journal information similar to data found in PubMed journals list (available in the Entrez Utilities at http://www.ncbi.nlm.nih.gov/entrez/query/static/eutils_help.html; updated daily)
  5. The List of Serials Indexed for Online Users available at http://www.nlm.nih.gov/tsd/serials/lsiou.html and the List of Journals Indexed in MEDLINE available at http://www.nlm.nih.gov/tsd/serials/lji.html.

Back to top

9. <JournalIssue>
This element contains information about the specific issue in which the article cited resides. It has a single attribute, CitedMedium, which indicates whether a citation is processed/indexed at NLM from the online or the print version of the journal. The two valid attribute values are Internet and Print.

Examples are:

< JournalIssue CitedMedium ="Internet ">
< JournalIssue CitedMedium ="Print ">

Back to top

10. <Volume>
The volume number of the journal in which the article was published is recorded here.

Examples are:

<Volume>7</Volume>
<Volume>5 Spec No</Volume>
<Volume>49 Suppl 20</Volume>
<Volume>Doc No 93</Volume>

The last example is for a journal published electronically. This format occurs rarely, as NLM prefers to put electronic document numbers in the <Pagination> element.

Additional information/background:
For records in the OLDMEDLINE subset (<CitationSubset> = OM): Some records contain <Issue> but lack <Volume>; some records contain <Volume> but lack <Issue>; and some records contain Volume and Issue data in the Volume element.

Back to top.

11. <Issue>
<Issue> identifies the issue, part or supplement of the journal in which the article was published.

Examples are:

<Issue>Pt 1</Issue>
<Issue>Pt B</Issue>
<Issue>3 Spec No</Issue>
<Issue>7 Pt 1</Issue>
<Issue>First Half</Issue>
<Issue>3 Suppl</Issue>
<Issue>3 Suppl 1</Issue>

Additional information/background:
For records in the OLDMEDLINE subset (<CitationSubset> = OM): Some records contain <Issue> but lack <Volume>; some records contain <Volume> but lack <Issue>; and some records contain Volume and Issue data in the Volume element.

Back to top.

12. <PubDate>
<PubDate> contains the full date on which the issue of the journal was published. The standardized format consists of elements for a 4-digit year, a 3-character abbreviated month, and a 1 or 2-digit day. Every record does not contain all of these elements; the data are taken as they are published in the journal issue, with minor alterations by NLM such as abbreviating months.

Examples are:

<PubDate>
<Year>2001</Year>
<Month>Apr</Month>
<Day>15</Day>
</PubDate>

<PubDate>
<Year>2001</Year>
<Month>Apr</Month>
</PubDate>

<PubDate>
<Year>2001</Year>
</PubDate>

The date of publication for the great majority of records will reside in the separate date-related elements within <PubDate> as shown above and in these cases the record will not contain <MedlineDate>. The date of publication of the article will be found in <MedlineDate> when parsing for the separate fields is not possible; i.e., cases where dates do not fit the Year, Month, or Day pattern.

Examples are:

<PubDate>
<MedlineDate>1998 Dec-1999 Jan</MedlineDate>
</PubDate>

<PubDate>
<MedlineDate>2000 Spring</MedlineDate>
</PubDate>

<PubDate>
<MedlineDate>2000 Spring-Summer</MedlineDate>
</PubDate>

<PubDate>
<MedlineDate>2000 Nov-Dec</MedlineDate>
</PubDate>

<PubDate>
<MedlineDate>2000 Dec 23-30</MedlineDate>
</PubDate>

Back to top.


13. <Title>
The full journal title (taken from NLM's cataloging data following NLM rules for how to compile a serial name) is exported in this element. Some characters that are not part of NLM's MEDLINE/PubMed Character Set reside in a relatively small number of full journal titles. The NLM journal title abbreviation is exported in the <MedlineTA> element.

Examples are:

<Title>Molecular microbiology</Title>
<Title>American journal of physiology. Cell physiology</Title>

Back to top.

14. <ISOAbbreviation>
This element is used to export NLM's version of the journal title ISO Abbreviation. ISO Abbreviations that reside in MEDLINE/PubMed records are constructed at NLM to assist NCBI in linking from GenBank to PubMed and do not necessarily conform to the ISO standard. Less than one-third of the journals currently covered in MEDLINE carry NLM's NCBI version of the ISO abbreviation in their catalog record and approximately one-fifth of all journals ever covered in MEDLINE carry it.

Examples are:

<ISOAbbreviation>Mol. Microbiol.</ISOAbbreviation>
<ISOAbbreviation>Am. J. Physiol., Cell Physiol.</ISOAbbreviation>

Back to top.

15. <ArticleTitle>
<ArticleTitle> contains the entire title of the journal article. <ArticleTitle> is always in English; those titles originally published in a foreign language and translated for <ArticleTitle> are enclosed in square brackets. All titles end with a period unless another punctuation mark such as a question mark or bracket is present. Explanatory information about the title itself is enclosed in parentheses, e.g.: (author's trans). Corporate/collective authors may appear at the end of <ArticleTitle> for citations up to about the year 2000. See also <AuthorList> for more information about corporate/collective authors.

Records distributed with [In Process Citation] in <ArticleTitle> are non-English language citations in In-Process <MedlineCitation> status that do not yet have the article title translated into English.

Examples are:

<ArticleTitle>The Kleine-Levin syndrome as a neuropsychiatric disorder: a case report.</ArticleTitle>
<ArticleTitle>Why is xenon not more widely used for anaesthesia?</ArticleTitle>
<ArticleTitle>[Biological rhythms and human disease]</ArticleTitle>
<ArticleTitle>[In Process Citation]</ArticleTitle>
<ArticleTitle>[Anterior panhypopituitarism after sella turcica fracture (author's trans)]</ArticleTitle>
<ArticleTitle>Prevalence of Helicobacter pylori resistance to antibiotics in Northeast Italy: a multicentre study. GISU. Interdisciplinary Group for the Study of Ulcer.</ArticleTitle>

Additional information/background:
For records in the OLDMEDLINE subset (<CitationSubset> = OM): For citations from the 1964 and 1965 Cumulated Index Medicus (CIM), <ArticleTitle> is in all upper case letters. Some citations contain the value "Not Available" for <ArticleTitle>. OLDMEDLINE records do not contain Corporate/Collective authors in <ArticleTitle>.

Back to top.

16. <Pagination>
<Pagination> indicates the inclusive pages for the article cited. The pagination can be entirely non-digit data and redundant digits are omitted. Document numbers for electronic articles are found here. <ELocationID> was defined for use in 2008 and may reside on records either in lieu of Pagination or, for items with both print and electronic locations, in addition to the Pagination element.

The complete pagination is found in the <MedlinePgn> element. The <StartPage> and <EndPage> elements are not distributed.

Examples are:

<MedlinePgn>12-9</MedlinePgn>
<MedlinePgn>304-10</MedlinePgn>
<MedlinePgn>335-6</MedlinePgn>
<MedlinePgn>1199-201</MedlinePgn>
<MedlinePgn>24-32, 64</MedlinePgn>
<MedlinePgn>34, 72, 84 passim</MedlinePgn>
<MedlinePgn>31-7 cntd</MedlinePgn>
<MedlinePgn>176-8 concl</MedlinePgn>
<MedlinePgn>iii-viii</MedlinePgn>
<MedlinePgn>XC-CIII</MedlinePgn>
<MedlinePgn>P32-4</MedlinePgn>
<MedlinePgn>32P-35P</MedlinePgn>
<MedlinePgn>suppl 111-2</MedlinePgn>
<MedlinePgn>564</MedlinePgn>
<MedlinePgn>[6021 words; 81 paragraphs]</MedlinePgn>
<MedlinePgn>E101-6</MedlinePgn>
<MedlinePgn>44; discussion 44-8</MedlinePgn>
<MedlinePgn>925; author reply 925-6</MedlinePgn>
<MedlinePgn>e66</MedlinePgn>
<MedlinePgn>1 p preceding table of contents</MedlinePgn>
<MedlinePgn>
129e1-4</MedlinePgn> (abstract only on page 129 of print journal with full text article on pages e1-e4 in the online version)
<MedlinePgn>10.1-8</MedlinePgn> (e-article number followed by a period followed by traditional pagination)

Additional information/background:
Beginning in December 2002, new rules are followed for pagination for letters to the editor that include text consisting of an author reply. If the reply is written by one or more of the authors of the original article, the words "author reply" are used in the pagination field rather than the word "discussion". "Discussion" continues to be used within pagination for other types of articles, such as an article presented at a meeting that is followed by the text of a separate discussion or verbal exchange by a panel or others attending the meeting. This new rule for pagination that includes "author reply" applies only to citations with <PublicationType> Letter.

Back to top.

17. <ELocationID>

The purpose of the ELocationID element, defined in 2008 for prospective use, is to provide an electronic location for items which lack standard page numbers. The element will house Digital Object Identifiers (DOIs) or Publisher Item Identifiers (PIIs) that are provided by publishers for new citations submitted to NLM for inclusion in MEDLINE/PubMed. ELocationID may reside either in lieu of pagination or, for items with both print and electronic locations, in addition to the Pagination element. Existing records which exported from NLM's Data Creation and Maintenance System prior to creation of this element will not be globally maintained to add DOIs.

The element has two attributes, EIdType and ValidYN. EIdType indicates the type of ELocation data, DOI or PII. It is anticipated that a DOI will be supplied far more frequently by publishers than a PII. The default ValidYN value is “Y”. If corrected ELocation data is supplied by publishers to NLM, the revised DOI will be tagged ValidYN=Y and the original DOI will be retained with the ValidYN value “N”.

Examples are:

<ELocationID EIdType="doi" ValidYN="Y">10.1021/cr068126n</ELocationID>
<ELocationID EIdType=”doi” ValidYN="N">10.1001/jama.298.18.216</ELocationID>

<ELocationID EIdType="pii" ValidYN="Y">18829</ELocationID>

Back to top.

18. <Abstract> and <AbstractText>
English-language abstracts are taken directly from the published article. If the article does not have a published abstract, the National Library of Medicine does not create one, thus the record lacks the <Abstract> and <AbstractText> elements. However, in the absence of a formally labeled abstract in the published article, text from a substantive "summary", "summary and conclusions" or "conclusions and summary" may be used.

Publishers have given the National Library of Medicine permission to use abstracts for which they claim copyright; NLM does not hold copyright on the abstracts in MEDLINE. Licensees should obtain an opinion from their legal counsel for any use they plan for the abstracts in the database.

Generally, there are no abstracts for records created before 1975. However, starting in April 2007 NLM began to add abstracts from articles in PubMed Central (PMC) to the equivalent MEDLINE/PubMed citation record if that record does not already contain an abstract. The abstracts are derived from the PMC scanning project which is digitizing the back issues of participating PMC journals. As a result, additional records published prior to 1975 will contain abstracts.

All abstracts are in English. Some records may contain <OtherAbstract> in addition to or instead of <Abstract>. Because data entry policies at NLM have changed over the years, abstracts in records may be truncated, in which case one of the following phrases may appear at the end of the text enclosed in parentheses:
ABSTRACT TRUNCATED AT 250 WORDS
ABSTRACT TRUNCATED AT 400 WORDS
ABSTRACT TRUNCATED (This message occurred infrequently once the maximum length was raised to 4,096 characters in 1996.)

The maximum length of abstracts for records created after 2000 is 10,000 characters.

An example is:

<AbstractText> Many disorders may result in delay of language. . . . . . . . . . . . . . The reason for suggesting this diagnostic category is to stress that these children do initially behave in a similar way to those who are peripherally deaf. (ABSTRACT TRUNCATED AT 250 WORDS)</AbstractText>

Structured abstracts, describing key aspects of the purposes, methods, and results in a consistent way, are published in some journals. The key aspects of structured abstracts are capitalized to stand out: e.g. "BACKGROUND" OBJECTIVES, "METHOD," etc. The text is not broken into paragraphs. Structured abstracts were not truncated in the past, even if they surpassed the previous 250 or 400 word limit.

An example is:

<AbstractText> BACKGROUND: Superantigens produced by Staphylococcus aureus and Streptococcus pyogenes are among the most lethal of toxins. Toxins in this family trigger an excessive cellular immune response leading to toxic shock. OBJECTIVES: To design an antagonist that is effective in vivo against a broad spectrum of superantigen toxins. METHODS: Short peptide antagonists were selected for their ability to inhibit superantigen-induced expression of human genes for cytokines that mediate shock. The ability of these peptides to protect mice against lethal toxin challenge was examined. RESULTS: Antagonist peptide protected mice against lethal challenge with staphylococcal enterotoxin B and toxic shock syndrome toxin-1, superantigens that share only 6% overall amino acid homology. Moreover, . . . </AbstractText>

Additional information/background:
History - Original policy on inclusion of abstracts set a limit of 250 words for acceptance. Effective with the January 1984 data (i.e., NLM's ELHILL legacy system 8401 Entry Month) two changes were made in this policy: 1) the word limit was expanded to 400 words for abstracts from articles 10 pages or more in length or from articles in the core journals identified by the National Cancer Institute and 2) abstracts exceeding the 250- or 400-word limit are to be included in truncated form at the end of the sentence closest to the word limit. The percentage of records with abstracts has increased over the years as more publishers gave permission for NLM to include these data. A chart showing the number of MEDLINE records containing abstracts in various segments of MEDLINE is available at: http://www.nlm.nih.gov/bsd/medline_lang_distr.html.

For records in the OLDMEDLINE subset (<CitationSubset> = OM): As a general rule, OLDMEDLINE citations do not contain abstracts; however, it is possible that on a rare occasion an abstract may reside on an OLDMEDLINE citation.

Back to top.

19. <CopyrightInformation> associated with <AbstractText> was introduced in 1999, and appears on a limited but increasing number of records. This singly-occurring element contains a copyright statement provided by the publisher of the journal and appears only on records supplied electronically to NLM by the publisher. NLM suggests that licensees display this information at the end of the abstract.

An example is:

<Abstract>
<AbstractText>Aphidicolin, a selective inhibitor of DNA polymerase, totally blocks DNA replication in the micronucleus but not in the macronucleus of Paramecium caudatum. The ciliates no longer divide and after 4 days the DNA content of the macronucleus has increased by 64%. Concomitantly the cell volume has increased by 53%.</AbstractText>
<CopyrightInformation>Copyright 1999 Academic Press.</CopyrightInformation>
</Abstract>

Publishers or authors may still claim copyright on abstracts in records lacking <CopyrightInformation>. Per Section F3 of the NLM License to Lease NLM Databases: "The Licensee and its users shall be solely responsible for compliance with any copyright restrictions; NLM assumes no responsibility or liability associated with the Licensee's (or any of the Licensee's users') use and/or reproduction of copyrighted material. Anyone contemplating reproduction of all or any portion of any of the NLM databases should consult legal counsel."

Back to top.

20. <Affiliation>
<Affiliation> is associated with author names (see <AuthorList>) and investigator names (see <InvestigatorList>).

Regarding author affiliations:
The affiliation of the first author resides in the separate <Affiliation> element preceding <AuthorList>. Starting in 1988, NLM began to include the address of the first author's affiliation on the record. Originally the address was intended to help differentiate between two authors of the same name, not to provide detailed mailing information. It evolved that the institution, city, and state including zip code for U.S. addresses, and country for foreign countries, were included if provided in the journal; sometimes the street address was also included if provided in the journal. In 1995, NLM began to add the designation USA at the end of <Affiliation> where the first author's affiliation is in the 50 United States or the District of Columbia. Effective January 1, 1996, NLM includes the first author's electronic mail (e-mail) address at the end of <Affiliation>, if present in the journal. Starting in 2003 the complete first author address is entered as it appears in the article with no words omitted. Note that the first author is not necessarily the corresponding or senior author identified in the published article; simply the first name in the published author list is entered.

Examples are:

<Affiliation>Department of Anesthesiology, University of Virginia Health Sciences Center Charlottesville 22908, USA. med2p@virginia.edu</Affiliation>
<Affiliation>Departamento de Farmacologia, Facultad de Medicina, Universidad Complutense de Madrid (UCM), 28040 Madrid, Spain.</Affiliation>
<Affiliation>Center for Children With Special Needs, Children's Hospital, and the Department of Pediatrics, University of Washington School of Medicine, 4800 Sand Point Way NE, CM:09, Seattle, WA 98105-0371, USA. jneff@chmc.org</Affiliation>

Regarding investigator affiliations (see <InvestigatorList>):
The investigator affiliation identifies the organization that the researcher was affiliated with at the time the article was written and as published in the journal. Unlike <Affiliation> associated with Author names, this affiliation generally does not include detailed address information.

Examples are:

<Affiliation>Marquette U, Milwaukee, WI</Affiliation>
<Affiliation>VA Med Ctr, Richmond, VA</Affiliation>

Back to top.

21. <AuthorList>
Personal and collective (corporate) author names published with the article are found in <AuthorList>. Anonymous articles (including those with pseudonyms) are identified by the absence of <AuthorList>. For records created from 1966 - 1983, every author of every journal article is included in <AuthorList>. For records created from 1984 - 1995, a maximum of 10 author names was entered in the database. Beginning with journal issues published in 1996 and through 1999, a maximum of 25 author names was entered, and beginning with journal issues published in 2000 all author names published in the journal again are entered. During the 1996 - 1999 time period, when there were more than 25 authors, the first 24 were taken plus the last author as the 25th occurrence. Beginning mid-2005, the various policy restrictions on number of author names entered in past years were lifted so that on an individual basis, a record may be edited to include all author names present in the published article regardless of the limitation in effect at the time the record was first created.

If an article has more authors than were entered into the record, then <AuthorList CompleteYN= "N"> indicates the list is not complete. This attribute, when set to "N" for No should be translated into 'et al.' for display purposes.

The attribute ValidYN is used on each Author occurrence to indicate the true spelling of the name (some published author names are subsequently corrected by the publishers and NLM retains both versions in the MEDLINE/PubMed record). ValidYN=Y (present for most author names) indicates the spelling of the name is correct; ValidYN=N (present for a small number of author names) indicates the spelling of the name is not correct, per publisher's erratum published in the journal.

Personal name <Author> data resides in the following elements:

  • <LastName> contains the surname
  • <ForeName> contains the remainder of name except for suffix
  • <Suffix> contains a valid MEDLINE suffix (e.g., 2nd, or 3rd, etc., Jr or Sr). Honorifics (e.g., PhD, MD, etc.) are not carried in the data.
  • <Initials> contains up to two initials

Additional information about initials:

  • Initials are found at the beginning of the name string or following a break. A break is a space or hyphen.
  • Only capital letters in ForeName are candidates for initials except for the letter following a hyphen. The letter following a hyphen is a candidate for an initial unless the string following the hyphen is 'ichi'.
  • If ForeName is only initials, there will be spaces between initials.
  • Initial includes the following particles: de, do, da, du, del, dos, el-, le and el. All except 'el-' are followed by a space and are preceded by a space or are at the beginning of the name string.
  • There is no space between initials, but there is a space between a particle and the initial it modifies. If found, all particles will be converted to lower case in <Initials>.
  • If language is Bulgarian, Russian, Serbo-Croatian (Roman), or Ukrainian, then Initial may be a 2-4 character transliterated mixed-case initial.

Full first and middle names, if published, are entered in <ForeName> beginning with items published in 2002. Prior to 2002 NLM did not enter full first or middle names; instead only initials were entered and pre-2002 records were not maintained to include full names. Full personal names are included, however, on all citations owned by one of NLM's collaborating data producers, the Kennedy Institute of Bioethics (KIE), (MedlineCitationOwner="KIE") regardless of year of publication. KIE also supplied full author names for some NLM-owned citations that predate 2002; those data are found in the <GeneralNote> element.

Examples are:

<Author ValidYN="Y">
<LastName>Melosh</LastName>
<ForeName>H J</ForeName>
<Suffix>3rd</Suffix>
<Initials>HJ</Initials>
</Author>

<Author ValidYN="Y">
<LastName>Abrams</LastName>
<ForeName>Judith</ForeName>
<Initials>J</Initials>
</Author>

<Author ValidYN="Y">
<LastName>Buncke</LastName>
<ForeName>Gregory M</ForeName>
<Initials>GM</Initials>
</Author>

<Author ValidYN="Y">
<LastName>Amara</LastName>
<ForeName>Mohamed el-Walid</ForeName>
<Initials>Mel-W</Initials>
</Author>

<Author ValidYN="Y">
<LastName>Gonzales-loza</LastName>
<ForeName>María del R</ForeName>
<Initials>Mdel R</Initials>
</Author>

<Author ValidYN="Y">
<LastName>Todoroki</LastName>
<ForeName>Shin-ichi</ForeName>
<Initials>S</Initials>
</Author>

<Author ValidYN="Y">
<LastName>Krylov</LastName>
<ForeName>Iakobish K</ForeName>
<Initials>IaK</Initials>
</Author>

Personal names of individuals (e.g., collaborators and investigators) who are listed in the paper as members of a collective/corporate group that is an author of the paper reside in <InvestigatorList>.

Collective or corporate name <Author> data resides in <CollectiveName>, which was introduced to MEDLINE in 2001. Prior to 2001, corporate author information was contained only at the end of <ArticleTitle>, where it remains for those retrospective records (see <ArticleTitle>). As they are encountered, these retrospective records may be individually maintained to move the collective/corporate name from <ArticleTitle> to <CollectiveName>. For records entering MEDLINE beginning in 2001, the collective/corporate name is found in <CollectiveName>. These names enter MEDLINE exactly as they appear in the journal; NLM will not edit the names to standardize them or translate them into English. NLM enters the Roman alphabet words (e.g., German, French) into <CollectiveName>. Transliterated Russian or other cyrillic names are also entered into <CollectiveName> but for Japanese, Chinese, Hebrew, and Arabic NLM puts the English translation of the name into the <CollectiveName>.

Initially, NLM placed the personal author names associated with an article before any collective names in AuthorList, regardless of the order in which the names appear in the published article. In May 2006, NLM began to enter the author names in the order cited in the published article. Thus, since that time, collective names are interspersed with personal names in AuthorList.

A complete <AuthorList> example including <CollectiveName> is:

<AuthorList CompleteYN="Y">
<Author ValidYN="Y">
<LastName>Gunnars</LastName>
<ForeName>B</ForeName>
<Initials>B</Initials>
</Author>
<Author ValidYN="Y">
<LastName>Nygren</LastName>
<ForeName>P</ForeName>
<Initials>P</Initials>
</Author>
<Author ValidYN="Y">
<CollectiveName>SBU-group. Swedish Council of Technology Assessment in Health Care</CollectiveName>
</Author>
</AuthorList>

When an author's name has been corrected from a published erratum, the corrected name is placed in <AuthorList>, and the incorrect name originally published is retained in the last occurrence with the attribute Valid YN=N. In this circumstance, <ErratumIn> under <CommentsCorrections> has an associated <Note> clarifying the correct and incorrect names (see <CommentsCorrections>).

An example is:

<AuthorList>
<Author ValidYN="Y">
<LastName>Dunkel</LastName>
<ForeName>E C</ForeName>
<Initials>EC</Initials>
</Author>
.
.
.
<Author ValidYN="Y">
<LastName>Whitley</LastName> (this is the corrected name)
<ForeName>R J</ForeName>
<Initials>RJ</Initials>
</Author>
.
.
.
<Author ValidYN="N">
<LastName>Whitely</LastName> (this is the originally published name)
<ForeName>R J</ForeName>
<Initials>RJ</Initials>
</Author>
</AuthorList>
.
.
.
<CommentsCorrections>
<ErratumIn>
<RefSource>J Infect Dis 1998 Aug;178(2):601</RefSource>
<Note>Whitely RJ[corrected to Whitley RJ]</Note>
</ErratumIn>
</CommentsCorrections>

Additional information/background:
The NLM Fact Sheet: Authorship in MEDLINE explains the current policies that are followed in designating the various forms of authorship in MEDLINE.

NLM expended much effort to parse the data converted from the legacy ELHILL format at the end of the 2000 production year accurately. Many citations from the 1966-1974 timeframe were changed to follow data entry conventions established later; e.g., particles such as "van der" were moved from the suffix position to the last name, and the abbreviations 2d and 3d were changed to 2nd and 3rd. It is possible to have only a <LastName>. Some occurrences of author data in this category are in error and will be corrected manually as time permits and subsequently redistributed as revised records.

For records in the OLDMEDLINE subset (<CitationSubset> = OM): Every published author name is included in <AuthorList> for citations from the 1951 - 1959 Current List of Medical Literature (CLML) and for citations from the 1960 - 1965 CIM. For citations from the 1950 CLML, a maximum of three author names were entered and the incomplete list is indicated by <AuthorList CompleteYN= "N">. OLDMEDLINE <LastName> and <ForeName> elements are in all upper case letters, except in some cases the particle is in lower case letters. <Suffix> is in upper and lower case letters. OLDMEDLINE records do not contain collective or corporate names. A small percentage of OLDMEDLINE records contain <LastName> only because that is the only Author data present in the printed index used to create the record.

The forward slash is a legal character for the LastName element; it is used as the second character in Ethiopian surnames.

Back to top.

22. <Language>
The language in which an article was published is recorded in <Language>. All entries are three letter abbreviations stored in lower case, such as eng, fre, ger, jpn, etc. When a single record contains more than one language value the XML export program extracts the languages in alphabetic order by the 3-letter language value. Some records provided by collaborating data producers may contain the value und to identify articles whose language is undetermined.

Examples are:
<Language>eng</Language>
<Language>rus</Language>

A table listing all languages found in MEDLINE is at http://www.nlm.nih.gov/bsd/language_table.html. A chart showing the number of English language MEDLINE articles in various segments of MEDLINE is available at: http://www.nlm.nih.gov/bsd/medline_lang_distr.html.

Additional information/background:
For records in the OLDMEDLINE subset (<CitationSubset> = OM): Approximately 14% of OLDMEDLINE citations have the language abbreviation "und" for undetermined.

Back to top.

23. <DataBankList>
This element contains information pertaining to the registration of several types of data: a) molecular sequence data; b) clinical trial numbers (beginning summer 2005); c) gene expression/molecular abundance data (beginning February 2006); and d) PubChem identifiers.
  1. NLM cooperates with international efforts to collect molecular sequence data. There are, at present, 7 databanks that register molecular sequences deposited with them by researchers. In the journal literature, a reference to the databank and the accession number assigned to the sequence may accompany, or substitute for, a lengthy graphic representation of the sequence itself. The <DataBank> and <AccessionNumber> elements in <DataBankList> are populated if this information appears in the journal article. If the article lists a databank but no accession number, only the abbreviation for the databank will be entered. There is no attempt to edit or verify the databank accession numbers that appear in the journal. Since sequences may be deposited with more than one databank, there may be multiple occurrences of the <DataBank> element in <DataBankList> associated with a single article. This information may appear on the article title page, in a footnote or in a statement such as: Sequence data from this article have been deposited with the EMBL, GenBank and DDBJ Data Libraries under Accession No. M16978.

    The 7 data banks registering molecular sequence data at the present time are:

    MEDLINE Abbreviation Databank
    GDB Johns Hopkins University Genome Data Bank
    GENBANK GenBank Nucleic Acid Sequence Database
    OMIM Mendelian Inheritance in Man (McKusick)
    PDB Protein Data Bank
    PIR Protein Identification Resource
    RefSeq Reference Sequence
    SWISSPROT Protein Sequence Database

    If an article has more molecular sequence databank numbers than entered into MEDLINE then <DataBank CompleteYN="N"> indicates the list is not complete in which case NLM suggests supplying the literal "etc." after the last occurrence for display purposes.

  2. Beginning in summer 2005, NLM includes the ClinicalTrials.gov identifier number in <DataBankList> elements when the article is devoted solely and entirely to announcing or reporting the results of the clinical trial. The ICMJE Web site (http://www.icmje.org/) contains an editorial and updates on the topic of registering clinical trials before publication of the results.

    Beginning mid-2006, MEDLINE citations also carry the International Standard Randomised Controlled Trial Number (ISRCTN) when the article is devoted solely and entirely to announcing or reporting the results of the clinical trial or other study that the Identifier Number represents. The ISRCTN Register is a clinical trials deposit site based in the UK that meets the criteria set forth by the ICMJE (International Committee of Medical Journal Editors) for responsible disclosure of information to the public. The letters ISRCTN are a part of the trial number. Retrospective maintenance was taken on existing citations in MedlineCitation status = MEDLINE to add ISRCTN numbers if an existing citation's article title or abstract contained that data.

  3. Beginning in February 2006, accession numbers for data deposited in NLM's Gene Expression Omnibus (GEO) database are included in the <DataBankList> element. GEO is a gene expression/molecular abundance repository supporting data submissions, and a curated, online resource for gene expression data browsing, query and retrieval.

    The <DataBankName> is GEO and the <AccessionNumber> is any one of four prefixes followed by a numeric string:

    GDSxxxx (GEO Data Set)
    GSExxxx (GEO SEries)
    GPLxxxx (GEO PLatform)
    GSMxxxx (GEO SaMple)

  4. Beginning in January 2007, identifiers for records in the PubChem Substance database may be included in the <DataBankList> element (but only if the data are included in the citation XML feeds from the publishers). The <DataBankName> is PubChem-Substance and the <AccessionNumber> is a numeric string, e.g. 10318689. In the distant future, identifiers for records in two additional databases, PubChem Compound and PubChem BioAssay, will likely also be added to MEDLINE/PubMed records.

In summary, valid values for <DataBankName> are:

GDB
GENBANK
OMIM
PDB
PIR
RefSeq
SWISSPROT

ClinicalTrials.gov
ISRCTN

GEO

PubChem-Substance
PubChem-Compound
PubChem-BioAssay

Additional information/background:
Molecular Sequence Data: NLM first began to include molecular sequence data with the 1988 indexing year. Prior to 2000, NLM's policy was to enter up to 30 databank accession numbers for each record. Some global maintenance was done over the years to add databank names/accession numbers whether or not the article itself contained those references. From 2000 forward, NLM enters all databank accession numbers published in the journal. <DataBank> data in <DataBankList> are sorted first by DataBank name and then by <AccessionNumber>.

PubChem Identifers: The PubChem project provides information on the biological activities of small molecules and its overall goal is to identify new and safer drug therapies. There are three PubChem databases: PubChem Substance, PubChem Compound, and PubChem-BioAssay. Each record in each database has a unique identifier, consisting of one or more numerical characters. This identifying number and the applicable database name are used to populate the DataBankName and AccessionNumber elements.

Examples are:

<DataBankList CompleteYN="N">
<DataBank>
<DataBankName>GENBANK</DataBankName>
<AccessionNumberList>
<AccessionNumber>AF078607</AccessionNumber>
<AccessionNumber>AF078608</AccessionNumber>
<AccessionNumber>AF078609</AccessionNumber>
<AccessionNumber>AF078610</AccessionNumber>
<AccessionNumber>AF078611</AccessionNumber>
<AccessionNumber>AF078612</AccessionNumber>
<AccessionNumber>AF078613</AccessionNumber>
<AccessionNumber>AF078614</AccessionNumber>
<AccessionNumber>AF078615</AccessionNumber>
<AccessionNumber>AF078616</AccessionNumber>
<AccessionNumber>AF078617</AccessionNumber>
<AccessionNumber>AF078618</AccessionNumber>
<AccessionNumber>AF078619</AccessionNumber>
<AccessionNumber>AF078620</AccessionNumber>
<AccessionNumber>AF078621</AccessionNumber>
<AccessionNumber>AF078622</AccessionNumber>
<AccessionNumber>AF078623</AccessionNumber>
<AccessionNumber>AF078624</AccessionNumber>
<AccessionNumber>AF078625</AccessionNumber>
<AccessionNumber>AF078626</AccessionNumber>
<AccessionNumber>AF078627</AccessionNumber>
<AccessionNumber>AF078628</AccessionNumber>
<AccessionNumber>AF078629</AccessionNumber>
<AccessionNumber>AF078630</AccessionNumber>
<AccessionNumber>AF078631</AccessionNumber>
<AccessionNumber>AF078632</AccessionNumber>
<AccessionNumber>AF078633</AccessionNumber>
<AccessionNumber>AF078634</AccessionNumber>
<AccessionNumber>AF078635</AccessionNumber>
<AccessionNumber>AF078636</AccessionNumber>
</AccessionNumberList>
</DataBank>
</DataBankList>

<DataBankList CompleteYN="Y">
<DataBank>
<DataBankName>GENBANK</DataBankName>
<AccessionNumberList>
<AccessionNumber>AF321191</AccessionNumber>
<AccessionNumber>AF321192</AccessionNumber>
</AccessionNumberList>
</DataBank>
<DataBank>
<DataBankName>OMIM</DataBankName>
<AccessionNumberList>
<AccessionNumber>118200</AccessionNumber>
<AccessionNumber>145900</AccessionNumber>
<AccessionNumber>162500</AccessionNumber>
<AccessionNumber>605253</AccessionNumber>
</AccessionNumberList>
</DataBank>
</DataBankList>

<DataBankList CompleteYN="Y">
<DataBank>
<DataBankName>ClinicalTrials.gov</DataBankName>
<AccessionNumberList>
<AccessionNumber>NCT00000161</AccessionNumber>
</AccessionNumberList>
</DataBank>
</DataBank>
</DataBankList>

<DataBankList CompleteYN="Y">
<DataBank>
<DataBankName>
ISRCTN</DataBankName>
<AccessionNumberList>
<AccessionNumber>
ISRCTN46889446</AccessionNumber>
</AccessionNumberList>
</DataBank>
</DataBankList>

<DataBankList CompleteYN="Y">
<DataBank>
<DataBankName>
GEO</DataBankName>
<AccessionNumberList>
<AccessionNumber>
GSE3847</AccessionNumber>
</AccessionNumberList>
</DataBank>
</DataBank>
</DataBankList>

<DataBankList>
<DataBank>
<DataBankName>
PubChem-Substance</DataBankName>
<AccessionNumberList>
<AccessionNumber>
17424970</AccessionNumber>
<AccessionNumber>
17424971</AccessionNumber>
</DataBank>
</DataBankList>

Back to top

24. <GrantList>
This element was introduced in 1981 and contains the following elements:
<GrantID> contains the Research grant or contract number (or both) that designates financial support by any agency of the United States Public Health Service or any institute of the National Institutes of Health. The data are generally recorded exactly as they appear in the published article; there is no attempt to standardize the numbers except as noted in the Additional information/background section, below. Beginning February 2006, UK's Wellcome Trust grant numbers are entered and in August 2007, additional UK grant funding agencies are referenced (see Additional information/background, below).
<Acronym> contains the 2-letter grant acronym. (Not used for Wellcome Trust or other UK funding agencies).
<Agency> contains the institute acronym. Beginning June 2005 when an author acknowledges that the research was supported by financial support from the Wellcome Trust, the value 'Wellcome Trust' is entered in <Agency>. Beginning August 2007 additional UK grant funding agencies, preceded by the country name 'United Kingdom', may be referenced and the country name 'United Kingdom' was retrospectively added before the value 'Wellcome Trust'.

If an article has more grant numbers than entered into MEDLINE then <GrantList CompleteYN ="N"> indicates the list is not complete in which case NLM suggests supplying the literal "etc." after the last occurrence for display purposes.

Beginning in March 2006, NLM adds GrantList data obtained from author manuscripts deposited in PubMed Central per NIH's public access policy (see http://www.pubmedcentral.nih.gov/about/authorms.html). As a result, GrantList data may now reside on MEDLINE records published prior to 1981.

A list of the possible values for the grant <Acronym> and <Agency> is available from the PubMed help at http://www.nlm.nih.gov/bsd/grant_acronym.html. Please be advised that while NLM enters the grant number, acronym and agency values are derived by using a machine algorithm against the grant number string. This may result in some inaccurate derivations, but the overall benefit of supplying the separate acronym and agency was considered to be worth the risk of some inaccuracies.

Examples are:
<GrantList CompleteYN="Y">
<Grant>
<GrantID>GM07283</GrantID>
<Acronym>GM</Acronym>
<Agency>NIGMS</Agency>
</Grant>
</GrantList>

<GrantList CompleteYN="N">
<Grant>
<GrantID>DK-44935</GrantID>
<Acronym>DK</Acronym>
<Agency>NIDDK</Agency>
</Grant>
<Grant>
<GrantID>GM-37753</GrantID>
<Acronym>GM</Acronym>
<Agency>NIGMS</Agency>
</Grant>
<Grant>
<GrantID>GM-44100</GrantID>
<Acronym>GM</Acronym>
<Agency>NIGMS</Agency>
</Grant>
</GrantList>

<GrantList CompleteYN="Y">
<Grant>
<Agency>
Wellcome Trust</Agency>
</Grant>
</GrantList>

<GrantList CompleteYN="Y">
<Grant>
<GrantID>
069355</GrantID>
<Agency>
Wellcome Trust</Agency>
</Grant>
</GrantList

<GrantList CompleteYN="Y">
<Grant>
<GrantID>
067427/Z/02/Z</GrantID>
<Agency>
Wellcome Trust</Agency>
</Grant>
</GrantList>

<GrantList CompleteYN="Y">
<Grant>
<GrantID>
DK060933</GrantID>
<Acronym>
DK</Acronym>
<Agency>
NIDDK</Agency>
</Grant>
<Grant>
<GrantID>
R01 DK060933-01A2</GrantID>
<Acronym>
DK</Acronym>
<Agency>
NIDDK</Agency>
</Grant>
<Grant>
<GrantID>
R01 DK060933-02</GrantID>
<Acronym>
DK</Acronym>
<Agency>
NIDDK</Agency>
</Grant>
</GrantList>

Additional information/background:
Through 1999 NLM entered up to 3 grant numbers for each record. Beginning in 2000, NLM began to transition to an unlimited number of grant numbers or contract numbers. Some MEDLINE citations from 2000 and 2001 may still be limited to 3 grant numbers or contract numbers, but beginning in 2002 NLM does not limit the number of grant numbers or contract numbers. Some collaborating partners record grant numbers for agencies outside the U.S. Public Health Service in the <GeneralNote> element.

Wellcome Trust and other UK funding agency GrantIDs may be all numeric (the actual grant number) or may also contain trailing identification data containing slashes followed by numeric and/or alpha characters. Initially the letters in the UK funding agency extensions have been converted to upper case. In the future it is likely that alpha characters in GrantID will appear in lower case with no conversion to upper case if present in lower case in the published article.

Due to the processing stream for adding the author manuscript data to the records, there may be intellectual duplicates for GrantIDs that are alike except for formatting and/or trailing identification (see last example above).

In some cases the prefix of NIH grant numbers was incorrectly published with a letter 'O' rather than the numeric '0'; e.g., RO1/AI45338-04 instead of R01/AI45338-04. The MEDLINE record, therefore, contained an incorrect grant number, although it agreed with the text of the article. In July 2006 NLM edited the affected records to change from capital letter O to the number 0 and introduced data entry validations to prevent this situation from occurring again. This practice deviates from standard policy in that the grant number data in the online citation may no longer match what is in the published article.

Back to top.

25. <PublicationTypeList>
This element is used to identify the type of article indexed for MEDLINE; it characterizes the nature of the information or the manner in which it is conveyed (e.g., Review, Letter, Retracted Publication, Clinical Conference). Records may contain more than one <Publication Type> that are listed in alphabetical order. <PublicationTypeList> is always complete; there is no attribute to indicate completeness.

An example is:

<PublicationTypeList>
<PublicationType>Journal Article</PublicationType>
<PublicationType>Retracted Publication</PublicationType>
<PublicationType>Review</PublicationType>
<PublicationType>Review, Tutorial</PublicationType>
</PublicationTypeList>

The <PublicationType> values with their descriptions may be downloaded from http://www.nlm.nih.gov/mesh. The publication type headings contain the value D in the Record Type (RY) field and the value 3 in the DescriptorName Form (DF) field. A simple list of the Publication Types is available at from the PubMed online Help.

Back to top.

26. <VernacularTitle>
<VernacularTitle> contains the title of each item originally published in a foreign language, in that language. Non-Roman alphabet language titles are transliterated.

Examples are:

<VernacularTitle>Temoignages et lettres.</VernacularTitle>
<VernacularTitle>Wplyw przebiegu rozwoju plodu i noworodka na ujawnienie sie niektórych chorób okresu doroslego.</VernacularTitle>

Additional information/background:
For records in the OLDMEDLINE subset (<CitationSubset> = OM): The vernacular title for citations from the 1964 and 1965 CIM is in all upper case letters. Some OLDMEDLINE citations to articles originally published in a foreign language lack <VernacularTtitle>.

Back to top.

27. <ArticleDate>
<ArticleDate> contains the date the publisher made an electronic version of the article, with the month represented as a 2-digit numeric rather than an alphabetic abbreviation as is the case for the month in PubDate. A record includes <ArticleDate> only if that data is included in the publisher's electronic submission to NLM, and it may be present on records with <Article> PubModel attribute values of Electronic, Print-Electronic or Electronic-Print.

The attribute DateType is always used with <ArticleDate>. It represents the media of the article published on the date in that element; the only valid value is "Electronic."

Various combinations of the <Article> PubModel attribute settings and the data in <ArticleDate> permit control of which dates display in the source. Click here for information on how to interpret these data to indicate print and/or electronic publication dates when creating the source area of the PubMed citation display.

Additional information/background:
The date that NLM displays in the source area of the MEDLINE/PubMed citation display is derived from <PubDate>, not <ArticleDate> if the date a publisher submits in <ArticleDate> is identical to the date submitted for <PubDate>. The NISO standard (Z39.29-200X Bibliographic References) stipulates display of an electronic date only when it predates a subsequent issue date. Licensees may also wish to follow this convention. Be advised that the same intellectual date may be present in both fields using slightly different formats, i.e., the use of a leading zero in the Month or Day elements.

Back to top.

28. <Country>
<Country> carries the place of publication of the journal. Valid values are those country names found in the Z category of the Medical Subject Headings (MeSH) Tree Structures that may be downloaded from http://www.nlm.nih.gov/mesh. <Country> values may appear in all upper case or in mixed case. On older records, in cases where the place of publication is unknown, the <Country> value is Unknown.

Examples are:

<Country>United States</Country>
<Country>UNITED STATES</Country>
<Country>FRANCE</Country>
<Country>Unknown</Country>

Country data are not maintained when names may change over time. These data are where the journal is published, not where the research was conducted.

Additional information/background:
For records in the OLDMEDLINE subset (<CitationSubset> = OM): Only citations from the 1964-1965 CIM carry a true country name in the <Country> element and the value is in all upper case letters. The place of publication is not available for all other citations, in which case the <Country> value is "Not Available".

Back to top.

29. <MedlineTA>
This element contains the standard abbreviation for the title of the journal in which an article appeared. See the NLM Fact Sheet "Construction of National Library of Medicine Title Abbreviations" at http://www.nlm.nih.gov/pubs/factsheets/constructitle.html which discusses the rules currently used by the National Library of Medicine (NLM) to construct title abbreviations for journals cited in MEDLINE. See <Title> for the full journal name.

Examples are:

<MedlineTA>JAMA</MedlineTA>
<MedlineTA>J Pediatr</MedlineTA>
<MedlineTA>J Comp Physiol B</MedlineTA>
<MedlineTA>Ann Biol Clin (Paris)</MedlineTA>

Information about journals cited in MEDLINE, including the complete title of the journal, is found in:

  1. NLM's online catalog available at LocatorPlus (http://locatorplus.gov) and NLM Catalog (http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?CMD=search&DB=nlmcatalog)
  2. SERFILE, another file that may be leased from NLM (see http://www.nlm.nih.gov/databases/leased.html)
  3. PubMed journals files located at http://www.nlm.nih.gov/bsd/serfile_addedinfo.html (contains limited journal information; updated daily)
  4. Entrez Journals database at http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=journals for basic journal information similar to data found in PubMed journals list (available in the Entrez Utilities at http://www.ncbi.nlm.nih.gov/entrez/query/static/eutils_help.html; updated daily)
  5. The List of Serials Indexed for Online Users available at http://www.nlm.nih.gov/tsd/serials/lsiou.html and the List of Journals Indexed in MEDLINE available at http://www.nlm.nih.gov/tsd/serials/lji.html.

Additional information/background:
All MEDLINE/PubMed records must be linked to a parent serial record in NLM's online catalog, LocatorPlus.
For records in the OLDMEDLINE subset (<CitationSubset> = OM): The journal's title abbreviation on OLDMEDLINE records may be different than the abbreviation found on the original citation in the printed index in which the citation was originally published.

Back to top.

30. <NlmUniqueID>
<NlmUniqueID> may be used to locate the complete serial record for the journal cited in MEDLINE records. The element's value is the accession number for the journal's record assigned in NLM's Integrated Library System, LocatorPlus at http://locatorplus.gov/. A <NLMUniqueID> may appear as seven, eight or nine charcaters and is the preferred element to use when looking for the serial record for the journal in which the article was published.

Examples are:

The LocatorPlus accession number for the New England Journal of Medicine is 0255562 and the MEDLINE records contain:
<NlmUniqueID>0255562</NlmUniqueID>.

The LocatorPlus accession number for the Japanese Journal of Infectious Diseases is 100893704 and the MEDLINE records contain:
<NlmUniqueID>100893704</NlmUniqueID>.

The LocatorPlus accession number for Sicilia Sanitaria is 20740130R and the MEDLINE records contain: <NlmUniqueID>20740130R</NlmUniqueID>.

Back to top.

31. <ISSNLinking>
The new ISSNLinking element contains the ISSN designated by the ISSN Network to enable co-location or linking among the different media versions of a continuing resource (separate ISSN’s are assigned for each media type in which a resource is issued). The ISSNLinking element designates the single unique ISSN for the resource, regardless of its medium. The element was defined for use in 2008; NLM cannot anticipate when the element will first be used. Existing records which exported from NLM's Data Creation and Maintenance System prior to establishing this element have not been globally maintained to add an ISSNLinking value.

An example is:

<ISSNLinking>1234-5678</ISSNLinking

Back to top.

32. <ChemicalList>
This element contains one or more <Chemical> elements that, in turn, contain <RegistryNumber> and <NameOfSubstance>. <ChemicalList> is always complete; there is no attribute to indicate completeness.

<RegistryNumber> contains the unique 5 to 9 digit number in hyphenated format assigned by the Chemical Abstracts Service to specific chemical substances; for enzymes, the E.C. number derived from Enzyme Nomenclature is placed in this element. A zero (0) is a valid value when an actual number cannot be located or is not yet available.

<NameOfSubstance> is the name of the substance that the registry number or the E.C. number identifies. The MeSH Vocabulary database that contains all <NameOfSubstance> values with their descriptions may be downloaded from http://www.nlm.nih.gov/mesh. These records are of two types: 1) Supplementary Concept Records in the MeSH file, identified with a Record Type of C, or 2) MeSH Category D descriptors identified with a Record Type of D and a tree number that begins with D.

An example of a chemical list is:

<Chemical List>
<Chemical>
<RegistryNumber>69-93-2</RegistryNumber>
<NameOfSubstance>Uric Acid</NameOfSubstance>
</Chemical>
<Chemical>
<RegistyNumber>6964-20-1</RegistryNumber>
<NameOfSubstance>tiadenol</NameOfSubstance>
</Chemical>
<Chemical>
<RegistryNumber>EC 3.1.1.34</RegistryNumber>
<NameOfSubstance>Lipoprotein Lipase</NameOfSubstance>
</Chemical>
<Chemical>
<RegistryNumber>EC 3.5.2.6</RegistryNumber>
<NameOfSubstance>beta-Lactamases</NameOfSubstance>
</Chemical>
</ChemicalList>

Back to top.

33. <CitationSubset>
<CitationSubset> identifies the subset for which MEDLINE records from certain journal lists or records on specialized topics were created. Some of these values are found on extremely small numbers of records. Citations may contain more than one occurrence of <CitationSubset>.

The value is true at the time the record was created; if the status of a journal changes, the value on the MEDLINE record does not change.

The values and their definitions for <CitationSubset> are as follows. Note that several are closed subsets no longer being assigned.

  • AIM = citations from Abridged Index Medicus journals, a list of about 120 core clinical, English language journals.
  • B = citations from non-Index Medicus journals in the field of biotechnology (not currently used).
  • C = citations from non-Index Medicus journals in the field of communication disorders (not currently used).
  • D = citations from dental journals.
  • E = citations in the field of bioethics. (includes records from the former BIOETHICS database)
  • F = older citations from one journal prior to its selection for Index Medicus; used to augment the database for NLM's International MEDLARS Centers (not currently used)
  • H = citations from non-Index Medicus journals in the field of health administration. (includes records from the former HealthSTAR database)
  • IM = citations from Index Medicus journals.
  • J = citations in the field of population information. (not currently used; on records from the former POPLINE® database)
  • K = citations from non-Index Medicus journals relating to consumer health.
  • N = citations from nursing journals.
  • OM = pre-1966 citations from the older print indices of the Cumulated Index Medicus (CIM) and the Current List of Medical Literature (CLML) (see more information about OLDMEDLINE). See Additional information/background: notations for specialized treatment of selected elements for OLDMEDLINE subset records, e.g., <DateCreated> and <DateCompleted>.
  • Q = citations in the field of the history of medicine. (includes records from the former HISTLINE® database)
  • QIS = citations from non-Index Medicus journals in the field of the history of medicine. (For NLM use effective in late 2006 because they require special handling at NLM; not a subset of Q; some journals previously designated as Q are now QIS.)
  • QO is subset of Q - indicates older history of medicine journal citations that were created before the former HISTLINE file was converted to a MEDLINE-like format. (For NLM use because they require special handling at NLM).
  • R = citations from non-Index Medicus journals in the field of population and reproduction (not currently used).
  • S = citations in the field of space life sciences. (includes records from the former SPACELINE™ database)
  • T = citations from non-Index Medicus journals in the field of health technology assessment. (includes records from the former HealthSTAR database)
  • X = citations in the field of AIDS/HIV. (includes records from the former AIDSLINE® database)

Examples are:

<CitationSubset>AIM</CitationSubset>
<CitationSubset>IM</CitationSubset>
<CitationSubset>X</CitationSubset>

Back to top.

34. <CommentsCorrections>
The data in the various <CommentsCorrections> elements listed below are citations to associated journal publications, e.g., comments, errata, or retractions. These data enable links between the record at hand and its associated citations. MEDLINE records may contain one or more of the following:

<CommentOn> cites the reference upon which the article comments; began use with journal issues published in 1989.
<CommentIn> cites the reference containing a comment about the article; began use with journal issues published in 1989.
<ErratumIn> cites the reference containing a published erratum to the article; began use in 1987.
<ErratumFor> cites the original article for which there is a published erratum.
<PartialRetractionIn> cites the reference containing a partial retraction of the article; began use in 2007.
<PartialRetractionOf> cites the article being partially retracted; began use in 2007.
<RepublishedFrom> cites the original article subsequently corrected and republished; began use in 1987.
<RepublishedIn> cites the final, correct version of a corrected and republished article; began use in 1987.
<RetractionOf> cites the article being retracted; began use in August 1984.
<RetractionIn> cites the reference containing a retraction of the article; began use in August 1984.
<UpdateIn> cites the reference containing an update to the article; began limited use in 2001.
<UpdateOf> cites the article being updated; limited use; began limited use in 2001.
<SummaryForPatientsIn> cites the reference containing a patient summary article; began use in Nov. 2001 (these records contain Publication Type, Patient Education Handout). See the article 'Patient Education Handouts in MEDLINE®/PubMed®' in the NLM Technical Bulletin for more information.
<OriginalReportIn> cites a scientific article associated with a patient summary.
<ReprintIn> cites the subsequent (and possibly abridged) version of a republished article; began use in 2006.
<ReprintOf> cites the first, originally published article; began use in 2006.

Each of the <CommentsCorrections> elements above will have an associated <RefSource>, usually the associated <PMID>, and possibly a clarifying <Note>.

<RefSource> contains the citation of the associated record.
<PMID> contains the PMID of associated record in PubMed (if available) thus providing a link between an article and its related comments, published errata, republished information, retractions, updates, or original articles and their patient summaries.
<Note> clarifies the data in <CommentsCorrection> and is used infrequently. It is most often used with <ErratumIn> for corrected author names (see <AuthorList>) and sometimes with <RetractionOf> and <RetractionIn> for when only part of an overall citation is retracted (as when only one abstract in a proceedings is retracted). Contents of <Note> include:

added is used when an author name is added to the citation as the result of a published erratum
removed when an author name is removed
dosage error in abstract when there is a dosage error in a citation abstract (used with <ErratumIn>)
dosage error in text when the dosage error is in the text portion of the article
abstract no. xxx only for an erratum of a numbered abstract which is part of an overall citation
abstract by author names on page xxx only when an erratum refers to an unnumbered abstract which is authored and part of an overall citation. The author names and page numbers of the abstract are included.
abstract abstract title on page xxx only when an erratum refers to an unnumbered abstract which is not authored. In this case, the abstract title and page number are included.

Additional information/background:
Some <CommentCorrection> elements do not have the PMID present. For some, a PMID will never exist where there are only one-way links; for others, record maintenance must take place before NLM can supply the correct PMID and these corrected records will be eventually be distributed as Revised records. Journal title abbreviations in <CommentOn> and <CommentIn> in retrospective data through the 2000 production year end with a period whereas journal title abbreviations in other <CommentsCorrections> elements (such as <ErratumIn>) do not end with a period. This apparent discrepancy is a result of parsing the data as it moved from one field to another during conversion from the legacy system in 2000. Current data entry convention is to use a period after the journal title abbreviation. <RefSource> for <RetractionOf> and <RetractionIn> may contain author names.

See the NLM Fact Sheet "Errata, Retraction, Partial Retraction, Corrected and Republished Articles, Duplicate Publication, Comment, Update, Patient Summary and Republished (Reprinted) Article Policy for MEDLINE®" at http://www.nlm.nih.gov/pubs/factsheets/errata.html for additional information.

Examples are:

<CommentsCorrections>
<ErratumIn>
<RefSource>J Infect Dis 1998 Aug;178(2):601</RefSource>
<Note>Whitely RJ[corrected to Whitley RJ]</Note>
</ErratumIn>
</CommentsCorrections>

<CommentsCorrections>
<ErratumIn>
<RefSource>Eur Respir J 2002 Feb;19(2):384</RefSource>
<Note>Correction of dosage error in abstract.</Note>
</ErratumIn>
</CommentsCorrections>

<CommentsCorrections>
<RetractionOf>
<RefSource>Gut. 2001 Mar;48 Suppl 1:A1-124</RefSource>
<PMID>11286195</PMID>
<Note>abstract no. 071 only</Note>
</RetractionOf>
</CommentsCorrections>

<CommentsCorrections>
<RetractionIn>
<RefSource>Gut. 2001 Jun;48(6):873</RefSource>
<PMID>11411464</PMID>
<Note>abstract no. 071 only</Note>
</RetractionIn>
</CommentsCorrections>

<CommentsCorrections>
<ErratumIn>
<RefSource>Mol Pharmacol 1997 Mar;51(3):533</RefSource>
</ErratumIn>
<RetractionIn>
<RefSource>Wu D, Yang CM, Lau YT, Chen JC. Mol Pharmacol. 1998 Feb;53(2):346</RefSource>
</RetractionIn>
</CommentsCorrections>

<CommentsCorrections>
<SummaryForPatientsIn>
<RefSource>Ann Intern Med. 2002 Jun 18;136(12):I-56</RefSource>
<PMID>12069573</PMID>
</SummaryForPatientsIn>
</CommentsCorrections>

<CommentsCorrections>
<OriginalReportIn>
<RefSource>Ann Intern Med. 2002 Jun 18;136(12):884-7</RefSource>
<PMID>12069562</PMID>
</OriginalReportIn>
</CommentsCorrections>

<CommentsCorrections>
<CommentOn>
<RefSource>Ann Intern Med. 2001 Apr 17;134(8):663-94</RefSource>
<PMID>11304107</PMID>
</CommentOn>
</CommentsCorrections>

<CommentsCorrections>
<CommentIn>
<RefSource>Ann Intern Med. 2002 Jun 18;136(12):926-7; discussion 926-7</RefSource>
<PMID>12069567</PMID>
</CommentIn>
<CommentIn>
<RefSource>Ann Intern Med. 2002 Jun 18;136(12):926-7; discussion 926-7</RefSource>
<PMID>12069568</PMID>
</CommentIn>
</CommentsCorrections>

<CommentsCorrections>
<ErratumIn>
<RefSource>
Antivir Ther. 2007;12(7):1145</RefSource>
<Note>Colatigli, Manuela [corrected to Colafigli, Manuela; Cattani, Paola [added]; Pannetti, Carmen [corrected to Pinnetti, Carmen]</Note>
</ErratumIn>
</CommentsCorrections>

<CommentsCorrections>
<ErratumIn>
<RefSource>
Surg Endosc. 2007 Aug;21(8):1473</RefSource>
<Note>
Francesco, M [removed]; Moccia, F [added]</Note>
</ErratumIn>
</CommentsCorrections>

The following is how NLM uses the data in these elements for display purposes after the source pagination: For each <CommentsCorrections> element which exists in the record, take the tag name and make it a constant label: for example: <RetractionIn> would produce the label 'Retraction in:' (a space is entered between the two words on the tag label and only the first word has an initial capital letter). Follow each label with a colon, space, and the content of the associated <RefSource> tag, then the contents of the associated <Note>, if present. Exception to the Label rule: Instead of the label 'Republished in:' and similarly for <RepublishedFrom>, create 'Corrected and republished from:'. If multiple occurrences exist on a label, repeat literal label and separate with a period, space.

Examples are:

Retraction in: Gut. 2001 Jun;48(6):873. abstract no. 071 only
Retraction in: Wu D, Yang CM, Lau YT, Chen JC. Mol Pharmacol. 1998 Feb;53(2):346
Original report in: Ann Intern Med. 2002 Jun 18;136(12):884-7

Additional information/background:
For records in the OLDMEDLINE subset (<CitationSubset> = OM): <CommentsIn> is the only <CommentsCorrections> element currently found on OLDMEDLINE records, although other <CommentsCorrections> elements may be used in the future.

Back to top.

35. <GeneSymbolList>: not currently input
<GeneSymbol> contains the "symbol" or abbreviated form of gene names as reported in the literature. This element resides in records processed at NLM from 1991 through 1995. <GeneSymbol> has a maximum length of 72 characters, although few, if any, gene symbols contain the maximum length. Up to 25 occurrences per record may appear; however this element is always complete in the XML record; there is no attribute to indicate completeness. NLM entered the symbols used by authors; there was no authority list or effort to standardize the data. <GeneSymbolList> is always complete; there is no attribute to indicate completeness.

Examples are:

<GeneSymbolList>
<GeneSymbol>pyrB</GeneSymbol>
<GeneSymbol>Ghox-lab</GeneSymbol>
<GeneSymbol>pulC</GeneSymbol>
</GeneSymbolList>

Additional information/background:
In the gene symbol field, SGML is used to designate Greek characters, superscripts, and subscripts that may appear as part of the gene symbol. The ampersand (&) and semicolon (;) are the respective beginning and ending delimiters for Greek characters with specified alphabetic codes to designate the appropriate letter and whether it is upper or lower case. The less than/greater than signs are used to define superscripted and subscripted regions. The beginning of a super-scripted region will be designated <up> while </up> signal its end. Similarly, <down> indicates the beginning of a sub-scripted region; while </down> indicates the end. A table, originally published on page 32 of the Sep - Oct 1990 NLM Technical Bulletin, displays the code designations for the Greek characters and may be found at http://www.nlm.nih.gov/bsd/license/greek_characters.html.

Back to top.

36. <MeshHeadingList>
NLM's controlled vocabulary, Medical Subject Headings (MeSH®), is used to characterize the content of the articles represented by MEDLINE citations. Records in MedlineCitation status = MEDLINE contain current MeSH headings. Records in the OLDMEDLINE subset now also contain <MeshHeadingList>. See Additional Information. The only MEDLINE records that do not contain MeSH headings are retractions of publications (see Fact Sheet). Of the various MeSH headings assigned to a record, those representing the most significant points are identified with the MajorTopic attribute set to Y for Yes. It is under those major descriptors that the citation can be located in Index Medicus. The remaining descriptors are used to identify concepts which have also been discussed in the item, but that are not the primary topics. See http://www.nlm.nih.gov/mesh/meshhome.html and http://www.nlm.nih.gov/pubs/factsheets/mesh.html for information about MeSH. Only records in Completed MedlineCitation Status contain MeSH headings. Each <MeshHeading> in <MeshHeadingList> contains <DescriptorName> and often <QualifierName>. <MeshHeadingList> is always complete; there is no attribute to indicate completeness.

The MajorTopic attribute for <DescriptorName> is set to Y (for YES) when the MeSH Heading alone is a central concept of the article (without a QualifierName).

<QualifierName>, along with its MajorTopic attribute indicates when the combination of that <QualifierName> with its associated <DescriptorName > is a central concept of the article. Qualifiers are also known as subheadings.

The presentation of <DescriptorName> is alphabetical. The <QualifierName> associated with a <DescriptorName> is also in alphabetical order, disregarding presence of the MajorTopic attribute.

Examples are:

<MeshHeadingList>
<MeshHeading>
<DescriptorName MajorTopicYN="N">Adult</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName MajorTopicYN="N">Cardiovascular Diseases</DescriptorName>
<QualifierName MajorTopicYN="N">etiology</QualifierName>
<QualifierName MajorTopicYN="Y">mortality</QualifierName>
</MeshHeading>
<MeshHeading>
<DescriptorName MajorTopicYN="N">English Abstract</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName MajorTopicYN="N">Fetal Growth Retardation</DescriptorName>
<QualifierName MajorTopicYN="N">complications</QualifierName>
<QualifierName MajorTopicYN="Y">physiopathology</QualifierName>
</MeshHeading>
<MeshHeading>
<DescriptorName MajorTopicYN="N">Human</DescriptorName>
</MeshHeading>
</MeshHeadingList>

In the above example, the mortality aspect of cardiovascular diseases and the physiopathology aspect of fetal growth retardation are the central concepts of the article. Note that the MeSH Heading English Abstract (also present in above example) means that a substantive English language abstract is present in the journal or written by one of NLM's collaborating data producers. The abstract may or may not be present in the MEDLINE citation as the input policy changed over the years. There are many older non-English language citations without abstracts in MEDLINE but with the MeSH Heading English Abstract; this indicates that an English abstract is present in the journal, even if not a part of the online record.

<MeshHeadingList>
<MeshHeading>
<DescriptorName MajorTopicYN="N">Animal</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName MajorTopicYN="N">Dogs</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName MajorTopicYN="Y">Myocardial Contraction</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName MajorTopicYN="N">Myocardium</DescriptorName>
<QualifierName MajorTopicYN="Y">metabolism</QualifierName>
</MeshHeading>
<MeshHeading>
<DescriptorName MajorTopicYN="Y">Oxygen Consumption</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName MajorTopicYN="N">Surface Tension</DescriptorName>
</MeshHeading>
</MeshHeadingList>

In the above example, myocardial contraction, the metabolism aspect of myocardium, and oxygen consumption are the central concepts of the article.

Additional information/background:
For records in the OLDMEDLINE subset (<CitationSubset> = OM): NLM has undertaken an OLDMEDLINE <Keyword>-to-<MeSH Heading> mapping project. This project maps the original subject headings assigned to the citations when they appeared in the older print indexes to the current MeSH vocabulary. The original subject headings reside in <KeywordList> and the current MeSH Headings to which they map reside in <MeshHeadingList>. At the beginning of 2007, approximately 72% of the OLDMEDLINE records have all of their Keywords mapped to current MeSH and approximately 93% of all OLDMEDLINE records have at least one current MeSH Heading. Additional mappings will occur as resources permit.

All MeSH Headings assigned to an OLDMEDLINE record have the MajorTopic attribute set to Y for Yes.

Back to top

37. <NumberOfReferences>
This element appears on records indexed with Review as a <PublicationType>; these data are not input for 'regular' journal articles. <NumberOfReferences> contains the number of bibliographic references listed in the review article.

An example is:

<NumberOfReferences>21</NumberOfReferences>

Additional information/background:
When collaborating partners recorded the number of references for non-review articles, these data are found in the <GeneralNotes> element.

Back to top.

38. <PersonalNameSubjectList>
Individuals' names appear in <PersonalNameSubject> for citations that contain a biographical note or obituary, or are entirely about the life or work of an individual or individuals. Data is entered in the same format as author names in <AuthorList> including <LastName>, <ForeName>, <Suffix>, and <Initials>. See <AuthorList> for details of format. <PersonalNameSubjectList> is always complete; there is no attribute to indicate completeness.

An example is:

<PersonalNameSubjectList>
<PersonalNameSubject>
<LastName>Koop</LastName>
<ForeName>C Everett</ForeName>
<Initials>CE</Initials>
</PersonalNameSubject>
</PersonalNameSubjectList>

Additional information/background:
An anonymous biography or obituary has the person's name in this element but the <AuthorList> is absent.

Back to top

39. <OtherID>
<OtherID> may reside on a record owned by a collaborating partner or on an NLM-owned record to which a collaborating partner added additional information not originally included by NLM on the record. <OtherID> and its Source attribute identifies a) the organization responsible for the information on the citation or the document where the information originated, and b) a unique number for that citation or document. If a partner created the record, the <OtherID>, in that case, is also the internal tracking number for the source document located at the partner's site. For example, a partner may add <Keyword> data to an NLM owned record. In that case the <OtherID> will contain that partner's internal document number for the journal article. The field may be multiply occurring.

The element <OtherID> has one or more of the Source attributes identifying the collaborating partner listed below. Some of the values on this list currently are not in use at this time and some may never be used.

NASA - National Aeronautics and Space Administration
KIE - Kennedy Institute of Ethics, Georgetown University
PIP - Population Information Program, Johns Hopkins School of Public Health; not currently used
POP - former NLM POPLINE database; not currently used
ARPL - Annual Review of Population Law; not currently used
CPC - Carolina Population Center; not currently used
IND - Population Index; not currently used
CPFH - Center for Population and Family Health Library/Information Program; not currently used
CLML - Current List of Medical Literature
IM - Index Medicus; reserved for future use (intended for pre-1966 publications)
QCICL - Quarterly Cumulative Index to Current Literature; reserved for future use (intended for pre-1966 publications)
QCIM - Quarterly Cumulated Index Medicusreserved for future use (intended for pre-1966 publications)
SGC - Surgeon General's Catalog; reserved for future use
NRCBL - National Reference Center for Biomedical Literature (for the KIE Reference Library shelving location)

Examples are:

<OtherID Source="KIE">101133</OtherID>
<OtherID Source="NRCBL">14.1</OtherID>

Additional information/background:
The Prefix attribute for OtherID is not used with MEDLINE/PubMed data.

For records in the OLDMEDLINE subset (<CitationSubset> = OM): This element, currently occurring only on 1950-1959 data, is for internal use at NLM. CLML is currently the Source attribute for all OLDMEDLINE citations containing <OtherID>; the Source IM may also be used on a limited basis. Other Sources that may be defined for future use with OLDMEDLINE records are QCICL and QCIM.

An example is:

<OtherID Source="CLML">5834:20412:395</OtherID>

Back to top.

40. <OtherAbstract>
NLM creates MEDLINE records without <Abstract> when the source journal article does not contain an abstract. Whether or not there is an <Abstract>, a collaborating partner (identified in the <OtherAbstract> Type attribute) may create an <OtherAbstract> for that record. If a partner creates and provides an abstract for <OtherAbstract>, the internal tracking number of the source document used by the collaborating partner resides in <OtherID>. <CopyrightInformation> to indicate authorship or editing of abstract is associated with <OtherAbstract> on a very small number of records.

The element <OtherAbstract> can have one or more of the Type attributes listed below:
AAMC - American Association of Medical Colleges; not currently used
AIDS - Special HIV/AIDS publications with abstracts written by someone other than the author
KIE - Kennedy Institute of Ethics, Georgetown University
PIP - Population Information Program, Johns Hopkins School of Public Health
NASA - National Aeronautics and Space Administration
Consumer - Abstracts written for consumers; reserved for future use

Back to top.

41. <KeywordList>
<KeywordList> contains controlled terms in <Keyword> that describe the content of the article. Except for OLDMEDLINE subset records (CitationSubset = OM), Keywords are assigned by a collaborating data producer. Not all MEDLINE data producers supply Keywords; those that do use their own list of specialized terms which may change during the year. Keywords on OLDMEDLINE subset records are the original subject headings found in the old printed indexes used to create the records (see 'Additional information/background' note below).

Keywords, when present, appear in addition to MeSH Headings, except some OLDMEDLINE status records do not yet contain current MeSH Headings. The same Keyword may exist in more than one Keyword List.

<KeywordList> is always complete; there is no attribute to indicate completeness. The element <KeywordList> can have one or more of the Owner attributes listed in <MedlineCitation>, which identifies the organization that assigned the subject terms. Other than OLDMEDLINE subset records, relatively few records in the database contain this element and the only Owner attributes currently in use with <KeywordList> are:

NASA - National Aeronautics and Space Administration
PIP - Population Information Program, Johns Hopkins School of Public Health
KIE - Kennedy Institute of Ethics, Georgetown University
NLM - National Library of Medicine (used for the OLDMEDLINE records)

Examples:

<KeywordList Owner="NASA">
<Keyword MajorTopicYN="N">NASA Discipline Space Human Factors</Keyword>
<Keyword MajorTopicYN="N">Non-NASA Center</Keyword>
</KeywordList>

<KeywordList Owner="KIE">
<Keyword MajorTopicYN="N">Birth Rate</Keyword>
<Keyword MajorTopicYN="N">Doe v. Bolton</Keyword>
<Keyword MajorTopicYN="N">Empirical Approach</Keyword>
<Keyword MajorTopicYN="N">New York</Keyword>
<Keyword MajorTopicYN="N">Roe v. Wade</Keyword>
<Keyword MajorTopicYN="Y">United States</Keyword>

</KeywordList><KeywordList Owner="NLM">
<Keyword MajorTopicYN="Y">DIABETIC DIET</Keyword>
<Keyword MajorTopicYN="Y">DIET, REDUCING</Keyword>
<Keyword MajorTopicYN="Y">HEART DISEASES/nutrition and diet</Keyword>
</KeywordList>

Additional information/background:
For records in the OLDMEDLINE subset (<CitationSubset> = OM): NLM has undertaken an OLDMEDLINE <Keyword>-to-<MeSH Heading> mapping project. This project maps the original subject headings assigned to the citations when they appeared in the older print indexes to the current MeSH vocabulary. OLDMEDLINE Keywords were first mapped to current MeSH in preparation for NLM's 2006 production year. Additional mappings occur as resources permit. At the beginning of 2007, approximately 72% of the OLDMEDLINE records have all of their Keywords mapped to current MeSH and approximately 93% of all OLDMEDLINE records have at least one current MeSH Heading. Additional mappings will occur as resources permit.

The following are characteristics of OLDMEDLINE subset Keywords:
* <KeywordList> contains the original MeSH Headings assigned at the time the articles were first indexed and included in one of the indexes printed prior to 1966. These data have not been updated, and may not match current MeSH vocabulary.
* Fewer subject headings (approximately two to six per citation) were usually assigned and check tags were generally not used.
* There are no subheadings on records published in the 1950 and 1951 CLML and the 1963 through 1965 CIM; Keywords from other years may contain MeSH Heading/subheading combinations.
* All Keywords are flagged as Major Topic (although that cannot be utilized for searching this data in PubMed).
* The KeywordList Owner attribute is "NLM".

Back to top

42. <SpaceFlightMission>
<SpaceFlightMission> resides on MEDLINE citations created by one of our collaborating data producers, the National Aeronautics and Space Administration (NASA). This element contains the space flight mission name and/or number when results of research conducted in space are covered in a publication. In cases where there are multiple space flight missions, the mission name is not directly linked to the descriptive values manned/unmanned or long/short duration that also reside in the <SpaceFlightMission> element in the MEDLINE record. For records containing more than one Space Flight Mission name, see the Space Flight Mission Summary Table at http://www.nlm.nih.gov/bsd/space_flight.html that provides the manned/unmanned status and duration of each mission.

Examples are:

<SpaceFlightMission>Flight Experiment</SpaceFlightMission>
<SpaceFlightMission>STS Shuttle Project</SpaceFlightMission>
<SpaceFlightMission>manned</SpaceFlightMission>
<SpaceFlightMission>short duration</SpaceFlightMission>

<SpaceFlightMission>Biosatellite 2 Project</SpaceFlightMission>
<SpaceFlightMission>Flight Experiment</SpaceFlightMission>
<SpaceFlightMission>Project Gemini 11</SpaceFlightMission>
<SpaceFlightMission>manned</SpaceFlightMission>
<SpaceFlightMission>short duration</SpaceFlightMission>
<SpaceFlightMission>unmanned</SpaceFlightMission>

Back to top.

43. <InvestigatorList>

Historically, <InvestigatorList> resided only on MEDLINE citations created or maintained by one of the NLM collaborating data producers, the National Aeronautics and Space Administration (NASA). In this context, <InvestigatorList> identifies the NASA funded principal investigator(s) who conducted the research discussed in the article cited (but are not necessarily the authors).

Beginning with the 2008 production year, InvestigatorList is also used to contain personal names of individuals (e.g., collaborators and investigators) who are not authors of a paper but rather are listed in the paper as members of a collective/corporate group that is an author of the paper. For records containing more than one collective/corporate group author, InvestigatorList does not indicate to which group author each personal name belongs. In this context, the names are entered in the order that they are published; the same name listed multiple times is repeated because NLM can not make assumptions as to whether those names are the same person. Also see the Collaborator Names section of the NLM Fact Sheet: Authorship in MEDLINE.

Data is entered in the same format as author names in <AuthorList> including <LastName>, <ForeName>, <Suffix>, and <Initials>. This element may contain <Affiliation> that identifies the organization that the researcher was affiliated with at the time the article was written. Unlike <Affiliation> associated with Author names, this affiliation does not include detailed address information. <InvestigatorList> is always complete; there is no attribute to indicate completeness.

Examples are:

<InvestigatorList>
<Investigator>
<LastName>Mortley</LastName>
<ForeName>D G</ForeName>
<Initials>DG</Initials>
<Affiliation>Tuskegee U, AL</Affiliation>
</Investigator>
</InvestigatorList>

<InvestigatorList>
<Investigator>
<LastName>
Nabel</LastName>
<ForeName>Elizabeth</ForeName>
<Initials>E</Initials>
</Investigator>
<Investigator>
<LastName>
Rossouw</LastName>
<ForeName>Jacques</ForeName>
<Initials>J</Initials>
</Investigator>
</InvestigatorList>

Back to top.

44. <GeneralNote>
<GeneralNote> contains supplemental or descriptive information related to the document cited in the MEDLINE record. It is a 'catchall' for various types of information included by NLM's collaborating producers.

<GeneralNote> can have one or more of the Owner attributes listed below, although some are not currently in use and some may never be used:

NLM - National Library of Medicine; not used
NASA - National Aeronautics and Space Administration
PIP - Population Information Program; Johns Hopkins School of Health
KIE - Kennedy Institute of Ethics, Georgetown University
HSR - National Information Center on Health Services Research and Health Care Technology, National Library of Medicine
HMD - History of Medicine Division, National Library of Medicine
SIS - Division of Specialized Information Services, National Library of Medicine; not used
NOTNLM - not used by NLM; reserved for use by licensees

Examples are:

<GeneralNote Owner="KIE">42 refs.</GeneralNote>
<GeneralNote Owner="KIE">Approved by the ACP Board of Regents on 23 Mar 1992 and by the IDSA Council on 21 Mar 1993.</GeneralNote>
<GeneralNote Owner="KIE">Adopted by the APHA Governing Council 3 Oct 1990.</GeneralNote>
<GeneralNote Owner="KIE">KIE BoB Subject Heading: INFORMED CONSENT/INSTITUTIONALIZED PERSONS/MENTALLY ILL</GeneralNote>
<GeneralNote Owner="KIE">KIE Bib: allowing to die/legal aspects; euthanasia/legal aspects; suicide</GeneralNote>
**Note: BoB Subject Headings are controlled subject vocabulary terms found in the Kennedy Institute of Ethics' Bioethics Thesaurus under which citations print in their publication, Bibliography of Bioethics. The current format of these data in MEDLINE is reflected in the second example beginning with "KIE Bib".
<GeneralNote Owner="KIE">118 fn.</GeneralNote>
**Note: "fn" is an abbreviation for "footnotes"
<GeneralNote Owner="KIE">Full author name: Weber, James</GeneralNote>
<GeneralNote Owner="KIE">Broden, Melodie S; Agresti, Albert A</GeneralNote>
<GeneralNote Owner="NASA">Grant numbers: NSG 155-61, NAS-95637.</GeneralNote>

Back to top.

45. <DeleteCitation>
PMIDs in DeleteCitation are for records previously distributed in any status and subsequently determined to be not within MEDLINE's or PubMed's coverage or to be duplicate citations. It is possible that a deleted citation can be distributed without a previous version ever going out. This would happen infrequently when the creation and completion of a new record, or error resolution of a previously created record, and deletion action occurs on the same day.

Coverage deletes: Most of the citation data in PubMed is submitted to NLM electronically by journal publishers. The publishers are instructed not to submit certain types of records because they are not covered by MEDLINE or PubMed. Examples of these are: book reviews, software or equipment reviews, announcements, erratum notices without additional substantive content, and papers to appear in forthcoming issues. Contrary to instructions, publishers do sometimes submit citations outside of NLM's coverage which are distributed to licensees in In-Data-Review status. As a result of routine checks, these record types are subsequently discovered and deleted.

Duplicate deletes: Publishers may accidentally submit citations that they have already submitted, or new records can otherwise be created for already existing citations, creating duplicate records which are also deleted as soon as they are discovered. Occasionally a publisher submits an issue electronically using incorrect volume, issue, or publication date data. These incorrect files are often deleted and resent correctly later rather than NLM trying to edit the files.

Back to top.

Return to MEDLINE/PubMed XML Data Elements
Return to Information for Licensees of NLM Data

Last updated: 08 September 2008
First published: 12 December 2005
Metadata| Permanence level: Permanence Not Guaranteed