Skip to Content
United States National Library of Medicine National Institutes of Health

MEDLINE®/PubMed® Characters

The MEDLINE/PubMed database is in English using standard Latin characters, but contains diacritical marks in author names, article titles, vernacular titles, full journal titles, and abstracts. The following nine non-spacing diacritical marks are supported in the current MEDLINE/PubMed database (only in combination with Latin small letters): diaeresis, breve, cedilla, acute, ring-above, macron, circumflex, tilde, and grave. Additionally, MEDLINE supports an uppercase and lowercase o and l with stroke.

The XML file extracts of MEDLINE data use UTF-8 encoding (from ISO/IEC 10646 and Unicode Standard -- see Unicode for more information on unicode and UTF-8 encoding). The UTF-8 encoded data is in unicode Normalized Form C (see Unicode Technical Report #15), which uses Unicode composite characters. This approach is consistent with the direction of the World Wide Web Consortium as described in Character Model for the World Wide Web. The characters that would be possible in a MEDLINE XML file with this approach are listed in MEDLINE Characters.  Some additional characters are found in full journal titles.

Last updated: 21 December 2005
First published: 27 September 1999
Metadata| Permanence level: Permanence Not Guaranteed