Vocabularies in the UMLS Metathesaurus

What vocabularies are available in the UMLS Metathesaurus?

There are more than 100 source vocabularies in the UMLS Metathesaurus, some in multiple languages. These include many different types of vocabularies and classifications, including clinical vocabularies, classifications, administrative code sets, and thesauri used to index and retrieve scientific literature.

The current appendix to the Metathesaurus license agreement always includes a complete list in alphabetic order by the versionless source abbreviation used in the Metathesaurus files. A categorized list of the English language vocabularies is also available, as is a table outlining which HIPAA-mandated administrative code sets and clinical terminologies designated as target U.S. government-wide standards by the Consolidated Health Informatics (CHI) eGov initiative or recommended by the NCVHS are present in the Metathesaurus.

Are there plans to add other vocabularies?

Yes, a list of planned additions is available.

How are vocabularies chosen for inclusion in the UMLS Metathesaurus? Can I suggest that a vocabulary be added?

Anyone can recommend a vocabulary for inclusion in the UMLS Metathesaurus.

NLM is committed to including all the code sets required for administrative transactions under the Health Insurance Portability and Accountability Act of 1996 (HIPAA) and all terminologies recommended as U.S. government-wide standards by the Consolidated Health Informatics (CHI) eGov initiative and/or as core terminology standards by the National Committee on Vital and Health Statistics (NCVHS) so these receive the highest priority.

Recommendations from UMLS users with current production applications that would benefit from the addition of a particular terminology also carry significant weight. There are several evaluation questions that NLM considers when reviewing recommendations for additions to the Metathesaurus. Recommendations for additions to the Metathesaurus should be sent to custserv@nlm.nih.gov. Address the answers to the evaluation questions in your message.

What is involved in adding a vocabulary to the UMLS Metathesaurus?

There are four major steps in adding a vocabulary to the Metathesaurus:

  • Analysis and Inversion
  • Insertion
  • Human Editing
  • Quality Assurance

Analysis and Inversion:

After NLM has obtained an electronic version of a vocabulary to be added to the UMLS Metathesaurus, its explicit and implied semantics and structure are carefully analyzed. To avoid unnecessary work later, considerable care must be taken in the initial analysis, e.g.:

  • to find systematic naming patterns that may differ from those used to name the same concepts in other sources
  • to determine whether terms labeled as synonyms by the source reflect a strict interpretation of synonymy or are actually closely related "entry terms"
  • to develop algorithms for assigning default semantic types to various categories of terms in the vocabulary
  • to identify certain term types (such as short forms or non-standard abbreviations) that should be flagged as suppressible synonyms in the Metathesaurus
  • to determine whether there may be considerable undetected synonymy within the vocabulary
  • to determine whether there is intentional use of the same name for different concepts
  • etc.

Once the semantics and structure of the vocabulary are clearly understood, it is "inverted" into a standard Metathesaurus input format, with unique Metathesaurus “atomic” identifiers generated and assigned to all of its strings or concept names.


Once inverted into a standard, explicit format, the new terminology is inserted into the Metathesaurus Maintenance system. This involves the development of a set of rules or recipe for what types of merges with existing Metathesaurus content should or should not be allowed. For example, since SNOMED CT includes previous SNOMED and National Health Services (NHS) Clinical Terms identifiers and most of these previous identifiers were already in the Metathesaurus, rules for when to merge based on these identifiers – despite other conflicting information – were developed. There will be one or more "test" insertions prior to the "real" insertion so that the "recipe" for insertion can be adjusted to minimize unnecessary work by human subject experts.

Human Editing:

Once a new vocabulary is inserted, human experts with the requisite clinical, chemical, or basic science expertise review and edit Metathesaurus entries affected by the automated insertion routines. Editors ensure that the Metathesaurus accurately reflects the meanings present in its source vocabularies and that value-added information (e.g., semantic types) is applied correctly. When two source vocabularies differ in their views of synonymy, the editors determine the view (perhaps a third alternative) that will be reflected in the Metathesaurus concept structure.

Quality Assurance:

Both standard and source-specific quality assurance queries are run to identify potential errors of commission and omission. Entries identified by these queries are reviewed by human experts and edited as necessary.

What happens when a new version of a vocabulary that is already in the Metathesaurus is released?

When a new version of a vocabulary present in the Metathesaurus is published, the process of Analysis and Inversion, Insertion, Human Review, and Quality Assurance (see above) is repeated for the new and changed content in the vocabulary. In the case of an update to a vocabulary already present in the Metathesaurus, determining exactly what has changed from the previous version is a key part of the Analysis step. The electronic versions of vocabularies often don’t include explicit data about what has changed.

Last reviewed: 06 August 2008
Last updated: 06 August 2008
First published: 26 March 2004
Metadata| Permanence level: Permanence Not Guaranteed