Skip to Content
United States National Library of Medicine National Institutes of Health

Fact Sheet
UMLS® Metathesaurus®


Introduction

The Metathesaurus is a very large, multi-purpose, and multi-lingual vocabulary database that contains information about biomedical and health related concepts, their various names, and the relationships among them. Designed for use by system developers, the Metathesaurus is built from the electronic versions of many different thesauri, classifications, code sets, and lists of controlled terms used in patient care, health services billing, public health statistics, indexing and cataloging biomedical literature, and/or basic, clinical, and health services research. These are referred to as the "source vocabularies" of the Metathesaurus. The term Metathesaurus draws on Webster's Dictionary third definition for the prefix "Meta," i.e., "more comprehensive, transcending." In a sense, the Metathesaurus transcends the specific thesauri, vocabularies, and classifications it encompasses.

Properties of the Metathesaurus

The Metathesaurus reflects and preserves the meanings, concept names, and relationships from its source vocabularies. When two different source vocabularies use the same name for differing concepts, the Metathesaurus represents both of the meanings and indicates which meaning is present in which source vocabulary. When the same concept appears in different hierarchical contexts in different source vocabularies, the Metathesaurus includes all the hierarchies. When conflicting relationships between two concepts appear in different source vocabularies, both views are included in the Metathesaurus. Although specific concept names or relationships from some source vocabularies may be idiosyncratic and lack face validity, they are still included in the Metathesaurus.

In other words, the Metathesaurus does not represent a comprehensive NLM-authored ontology of biomedicine or a single consistent view of the world (except at the high level of the semantic types assigned to all its concepts). The Metathesaurus preserves the many views of the world present in its source vocabularies because these different views may be useful for different tasks.

The scope of the Metathesaurus is determined by the combined scope of its source vocabularies. Many relationships (primarily synonymous), concept attributes, and some concept names are added by the NLM during Metathesaurus creation and maintenance, but essentially all the concepts themselves come from one or more of the source vocabularies. With very few exceptions, if none of the source vocabularies contains a concept, that concept will not appear in the Metathesaurus.

Because it is a multi-purpose resource that includes concepts and terms from many different source vocabularies developed for very different purposes, the Metathesaurus must be customized for effective use in most specific applications. Your decisions about what to include in your customized subset(s) of the Metathesaurus will have a significant effect on its utility in your systems. Vocabulary sources that are essential for some purposes, e.g., LOINC for standard exchange of laboratory data, may be detrimental for others, such as natural language processing. It can also be important to exclude a subset of the concept names found in a vocabulary source that is otherwise useful, e.g., non-standard abbreviations or shortened forms that lack face validity or produce spurious results in natural language processing.

Applying the Metathesaurus

The Metathesaurus supplies information that computer programs can use to create standard data, interpret user inquiries, interact with users to refine their questions, and convert the users' terms into the vocabulary used in relevant information sources. The Metathesaurus is used in a wide range of applications including: linking between different clinical or biomedical vocabularies; information retrieval from databases with human assigned subject index terms and from free-text information sources; linking patient records to related information in bibliographic, full-text, or factual databases; natural language processing and automated indexing research; and structured data entry. In many cases, the utility of the Metathesaurus is enhanced when it is used in combination with the SPECIALIST Lexicon, the lexical programs, and the UMLS Semantic Network. To obtain coherent, comparable results in data creation applications, such as patient data entry, it is necessary to define which Metathesaurus concepts and terms can be included in the records being created. This may be done by selecting one or more of the many Metathesaurus source vocabularies which provide the most appropriate concepts and terms for the specific data being created. Other Metathesaurus concepts and terms will then provide synonyms and related terms which can help to lead users to the vocabularies selected for a particular data creation application.

Obtaining the UMLS Metathesaurus

The Metathesaurus (and other UMLS products) is available free to both U.S. and international users. Users must complete an online Web-based License Agreement for the Use of UMLS Metathesaurus. Licensees are responsible for complying with the restrictions on use of the contents of the UMLS Metathesaurus that are detailed in the agreement. Although much of the content of the Metathesaurus may be used with minimal restrictions, some uses of some Metathesaurus source vocabularies require separate agreements, which may involve fees, with the individual vocabulary producers.

The UMLS Metathesaurus is available to licensees via download, by Web interface, and an applications programmer interface (API) from the UMLS Knowledge Source Server. It is also available on DVD to UMLS licensees by request. A complete description of the Knowledge Sources and their distribution formats can be found in the UMLS Documentation.

Other Fact Sheets in the UMLS series: Unified Medical Language System, UMLS Semantic Network , SPECIALIST Lexicon, UMLS Knowledge Source Server, and UMLS MetamorphoSys.

For additional information, send an email to custserv@nlm.nih.gov or call 1-888-FINDNLM.


A complete list of NLM Fact Sheets is available at:
(alphabetical list) http://www.nlm.nih.gov/pubs/factsheets/factsheets.html
(subject list): http://www.nlm.nih.gov/pubs/factsheets/factsubj.html

Or write to:

FACT SHEETS
Office of Communications and Public Liaison
National Library of Medicine
8600 Rockville Pike
Bethesda, Maryland 20894

Phone: (301) 496-6308
Fax: (301) 496-4450
email: publicinfo@nlm.nih.gov

Last updated: 28 March 2006
First published: 28 March 2006
Metadata| Permanence level: Permanent: Stable Content
Previous version