PubMed Central Tagging Guidelines


Introduction
square General Tagging Practice
Document Objects
Elements
Update History


DTD Version

Rule Selection
Hide All v2.3 Rules
Hide All v3.0 Rules


Tools & Resources
Style Checker
Fully-Tagged Samples
Fully-Tagged Citations
Email List
DTD Documentation

General Tagging Practice

DTD

2.3

The XML should conform to the NLM Journal Publishing DTD, version 2.3. (http://dtd.nlm.nih.gov/publishing/2.3/index.html)

The DTD is available on the Web: http://dtd.nlm.nih.gov/publishing/2.3/journalpublishing.dtd

The complete Tag Library is available on the web: http://dtd.nlm.nih.gov/publishing/tag-library/2.3/index.html

All of the files are available by FTP ftp://ftp.ncbi.nih.gov/pub/archive_dtd/publishing.

3.0

The XML should conform to the NLM Journal Publishing DTD, version 3.0. (http://dtd.nlm.nih.gov/publishing/3.0/index.html)

The DTD is available on the Web: http://dtd.nlm.nih.gov/publishing/3.0/journalpublishing3.dtd

The complete Tag Library is available on the web: http://dtd.nlm.nih.gov/publishing/tag-library/3.0/index.html

All of the files are available by FTP ftp://ftp.ncbi.nih.gov/pub/archive_dtd/publishing.

Capitalization

Use title case capitalization for PMC, particularly in <article-title> and <subject>.

Continuous Makeup Articles

When more than one article starts on the same page, treat each as an individual article. Tag each in its own file with a unique filename. Articles that start on the same page will have the same <fpage>. Use the @seq to assign sequence letters so that each article will have a unique fpage/sequence.

Embargo Delay

Generally speaking, all articles for a journal in PMC have the same delay time between the publication date and the time they are available in PMC. In some cases, however, these delays need to be set at the article level rather than at the journal level. See "Release Delay" under Processing Instructions for details on how to tag embargo delays for individual articles.

Empty Elements

Do not use empty elements for the purpose of formatting.

All required elements should have content.

Formatted Text

As a rule, use formatted text (<bold>, <italic>, <sc>, etc) only to set off a piece of information. Do not set entire elements in formatted text. For example, if a <title> is set completely in boldface, do not tag the <bold>. However, if a title has a word or some words set in boldface for emphasis, tag those words using <bold>. Mainly this will apply to <title>, <p> in <abstract>, and <label>. It might also show up in <aff> and "special" sections, like <ack>.

Languages

Based on the agreement between the publisher and NLM, PubMed Central may accept non-English articles and/or English articles with non-English parts (titles, abstracts, etc.)

Non-English content needs to be identified with @xml:lang. Unlike nearly all other attributes in XML, the value of language is inherited. This means that all elements inside the one with the language attribute (its descendants) are assumed to be in the same language, unless they explicitly set their own @xml:lang attribute.

In general, the rule for tagging language is that the main language of the article should be set in the @xml:lang on <article>. Any item within the article that is in a different language should be tagged with an @xml:lang to identify the language of that piece.

Note: English is the default value for @xml:lang on <article>, <response>, and <sub-article> and does not need to be set explicitly at these levels.

See the multiple-language examples in <abstract> for tagging details.

Links

Tag all links within the document (e.g. tables, figures, display formula) with the <xref> element and include the appropriate @link-type. See <xref>.

Tag all external links with <ext-link> and include the appropriate @ext-link-type. See <ext-link>.

Tag all links to related articles (e.g. from a correction to the corrected article) with <related-article> and include the appropriate @ext-link-type, @related-article-type, and citation information. See <related-article>.

Math

Tag all display formula with MathML mixed markup. Tag inline formula with MathML mixed markup when it cannot be represented by regular article elements and Unicode characters. MathML 2.0 is included in the DTD. Each <mml:math> should have an id.

Do not set math as <tex-math>.

Use <mml:math> to tag mathematical expressions only. Do not use it to tag single characters!

Publication Dates

Article publication dates are based on the publishing model of the journal in which the articles are published.

There are two basic classes of publication: issue-based and article-based.

For specific examples of various publishing models and their corresponding date types tagged in the NLM Journal Publishing DTD, see <pub-date>.

Punctuation

Do not tag trailing and extraneous punctuation for the purpose of formatting.

See <ref-list> examples in the Fully-tagged Samples.

Series Articles

Sometimes articles are part of a series. This series may either be a group of articles all in one issue or a series of articles (like a recurring column) spanning issues. Use the <series-title> to identify the title of the series that an article belongs to if appropriate.

Special Characters

Tag special characters with the Unicode hex number in character entity style (&#x263B;, ☻). For accented characters that cannot be represented by a single Unicode value, use the base character and Combining Diacritical Marks (x0300 to x036F) or Combining Diacritical Marks for Symbols (x20D0 to x20E3).

Do not use values from the Private Use Areas: xE000–xF8FF, xF0000–xFFFFD, and x100000–x10FFFD.

Do not use Unicode values designated as control codes. These ranges include but are not limited to x0000–0020, x0080–x009F.

Subjects

PMC uses subjects (under <article-categories>) to sort the issue contents and build the Table of Contents. Subjects can be hierarchical. They may describe the content of the article: Physical Sciences. Or they may give an indication of the type of article: Erratum.