Medical Subject Headings | |
To address the complex problems of relating user inquiries to the content of biomedical information sources and of aggregating comparable data derived from disparate databases, the NLM assembled a multidisciplinary in-house research group and also awarded a series of research contracts to a number of primarily academic investigators. The first several years of UMLS research were devoted to studying user needs, developing research tools, identifying required capabilities, exploring alternative methods for delivering these capabilities, and defining in general terms the new knowledge sources that would be needed to support integrated use of information from disparate electronic biomedical sources. Based on the results of this early work, the conception of UMLS components as "middleware" designed for use by system developers emerged. Since 1990, NLM has issued annual editions of UMLS Knowledge Sources and associated lexical programs. Over the past decade, these resources have grown and developed, the methodology of creating them has matured, and their utility has been demonstrated in many different information systems. Today more than 1,000 individuals and institutions worldwide license the UMLS resources, which are free-of-charge. The majority of the licensees use one or more of the UMLS components in a information systems, often in creative and innovative undertakings. The NLM itself uses UMLS components to enhance retrieval from a number of its information services, including the MEDLINE database available via PubMed, the ClinicalTrials.gov database of ongoing clinical trials sponsored by the National Institutes of Health and other organizations, and the NLM Gateway, which provides a single point of entry to a number of different NLM databases. The Library also relies heavily on the UMLS resources in its natural language processing and digital library research programs.
The Metathesaurus and the Semantic Network will be discussed in this article. Those interested in the SPECIALIST lexicon and its related natural language processing resources should consult the readings at the end of this article.
The Metathesaurus is a database of information on concepts whose names appear in one or more of a number of different controlled vocabularies and classifications used in the field of biomedicine. In general, the scope of the Metathesaurus is determined by the combined scope of its source vocabularies. The Metathesaurus preserves the meanings, hierarchical connections, and other relationships between terms present in its source vocabularies, while adding certain basic information about each of its concepts and establishing new relationships between concepts and terms from different source vocabularies.
The Metathesaurus contains concepts and concept names from more than 60 vocabularies and classifications, some in multiple editions. The 2000 edition of the Metathesaurus includes approximately 730,000 concepts and 1.5 million concept names. Most of the source vocabularies are included in their entirety. Some material from the UMLS Metathesaurus is from copyrighted sources
The Metathesaurus is organized similarly. All words and phrases that mean the same thing form a distinct concept or synonym class in the Metathesaurus. Each separate meaning appears as its own concept, together with links (represented relationships) to other concepts in the Metathesaurus. These relationships to the other concepts serve to define the semantic neighborhood of the concept. A user of the Metathesaurus, whether human or program, can navigate within this semantic neighborhood to find the names for the concept sought.
Multiple meanings of the same term are dealt with by separating the meanings and presenting them in different semantic neighborhoods. If you look up the word "corn" in a dictionary you will see one entry with multiple definitions. If you search for "corn" in the Metathesaurus you will find that it names two separate concepts. One concept is the plant used for food and the other one the anatomical abnormality. Each of these meanings has differing relationships within the Metathesaurus.
Each concept in the Metathesaurus has a unique concept identifier (CUI), which itself has no intrinsic meaning. This unique identifier is represented in the Metathesaurus by the letter C followed by 7 digits (i.e. C0010028). This identifier remains the same across versions of the Metathesaurus, irrespective of the term designated as the preferred name of the concept. This facilitates file maintenance and management, as well as tracking the meanings assigned to a given term changes over time. It is "the name [of a concept] that never changes."
Nine types of relationships exist in the Metathesaurus. They are listed in Table 2. Relationships are reciprocal, so that, for example, where one concept is broader than another, the other is noted as being narrower than the first. Relationship attributes, describing the exact nature of a relationship, may be assigned to a given relationship. These attributes are drawn from the set of permissible relationships within the Semantic Network (discussed below), with the additional relationship of "mapped_to".
Many of the source vocabularies included in the Metathesaurus place the concepts they include in some context. These contexts are in general hierarchical arrangements, for some organizational or classification purpose. NLM endeavors to preserve all of these different contexts in the Metathesaurus, so a single concept in the Metathesaurus can appear in multiple different hierarchies. As an example, several of the contexts in which the concept "fruit" appears are shown in Table 3. There is no attempt to merge or combine the different contextual views into one coherent hierarchical arrangement for the Metathesaurus. Given the different perspectives and purposes of the many UMLS source vocabularies, this would be an essentially impossible task.
Of particular note is that there may be narrative description(s) of the meaning of the concept. The majority of these definitions come from MeSH, but there are also definitions from a number of other sources. A few definitions are created specifically for the Metathesaurus when they are needed to distinguish among different meanings of the same string.
The Semantic Network can be visualized as a diagram where the types make up nodes within a network. The top of the network has two nodes, "Entity" and "Event". The remaining types each appear only in one location within the network Figure 1 includes a portion of the UMLS Semantic Network.
The primary link between the nodes is the `isa' link. This establishes the hierarchy of types within the Network and is used for deciding on the most specific semantic type available for assignment to a Metathesaurus concept. In addition, a set of non-hierarchical relations between the types has been identified. These are grouped into five major categories, which are themselves relations: `physically related to,' `spatially related to,' `temporally related to,' `functionally related to,' and `conceptually related to.'
The relations are stated between semantic types and do not necessarily apply to all instances of concepts that have been assigned to those semantic types. That is, the relation may or may not hold between any particular pair of concepts. So, although `treats' is one of several valid relations between the semantic types `Pharmacologic Substance' and `Disease or Syndrome,' a particular pharmacologic substance (e.g., penicillin) may not treat a particular disease (e.g., AIDS).
Once a machine-readable version of a vocabulary is made available to the NLM it is converted into a "normal" or canonical form. This "inversion" process requires careful consideration of how the source represents its meanings and attempts to make all of this representation explicit. Each source is then added to the existing Metathesaurus. Terms from different sources which are lexically similar to each other or to existing terms in the Metathesaurus, or which appear from other indications to be semantically identical to concepts in the Metathesaurus, are brought together (merged) as proposed synonyms located in a single Metathesaurus concept.
After this merging, the results are reviewed by editors, largely to assess if the proposed concept merge is appropriate. Editors also may add information such as additional relationships and semantic types. This human review is expedited by computational assistance, as is the quality assurance which takes place after the editing.
In categorizing the concepts, editors are encouraged to consider the most specific semantic type available. If the concept is broad or not represented by a more specific type, a broad category in the semantic type hierarchy is used. For example, a sub-tree under the node "Physical Object" is "Manufactured Object." It has only two child nodes, "Medical Device" and "Research Device." It is clear that there are manufactured objects other than medical devices and research devices. Rather than proliferate the number of semantic types to encompass multiple additional subcategories for these objects, concepts that are neither medical devices nor research devices are simply assigned the more general semantic type "Manufactured Object."
Periodically various types of quality assurance efforts are performed. The most important of these are efforts to insure that there is no missed synonymy, i.e., no terms meaning exactly the same thing in different concepts. Another important effort is made to insure that every concept is linked to others by some relationship.
In preparation for releasing of the next version of the Metathesaurus, all new releasable concepts are assigned concept unique identifiers (CUIs), while concepts previously present in the Metathesaurus retain their CUIs. The Metathesaurus is released as a set of relational tables.
Licensed users may ftp the UMLS Knowledge Sources or access them interactively from the UMLS Knowledge Source Server. On request, CD-ROMs are provided to users who do not have adequate connectivity to ftp the large files.
McCray AT, Razi AM, Bangalore AK, Browne AC, Stavri PZ. The UMLS Knowledge Source Server: a versatile Internet-based research tool. Proc AMIA Annu Fall Symp. 1996:164-8.
Campbell KE, Oliver DE, Spackman KA, Shortliffe EH. Representing thoughts, words, and things in the UMLS. J Am Med Inform Assoc. 1998 Sep-Oct;5(5):421-31.
Humphreys BL, Lindberg DA, Schoolman HM, Barnett GO. The Unified Medical Language System: an informatics research collaboration. J Am Med Inform Assoc. 1998 Jan-Feb;5(1):1-11.
Divita G, Browne AC, Rindflesch TC.Evaluating lexical variant generation to improve information retrieval. Proc AMIA Annu Fall Symp. 1998:775-9.
McCray AT, Browne AC. Discovering the modifiers in a terminology data set. Proc AMIA Annu Fall Symp. 1998:780-4.
McCray AT. The nature of lexical knowledge. Methods Inf Med. 1998 Nov;37(4-5):353-60.
McCray AT, Loane RF, Browne AC, Bangalore AK. Terminology issues in user access to Web-based medical information. Proc AMIA Annu Fall Symp. 1999:107- 11.
Tuttle MS, Cole WG, Sherertz DD, Nelson SJ. Navigating to knowledge. Methods Inf Med. 1995 Mar;34(1-2):214-31.
McCray AT, Nelson SJ. The representation of meaning in the UMLS. Methods Inf Med. 1995 Mar;34(1-2):193-201.
Hole WT, Srinivasan S. Discovery of missed synonymy in a large concept- oriented Metathesaurus. Proc AMIA Annu Fall Symp. 2000:354-358.
McCray AT, Hole WT. The scope and structure of the first version of the UMLS Semantic Network. Proc Annu Symp Comput Appl Med Care. 1990:126-3
Tuttle MS, Nelson SJ, Fuller LF, Sherertz DD, Erlbaum MS, Sperzel WD, Olson NE, Suarez-Munist ON. The semantic foundations of the UMLS metathesaurus. Medinfo 1992:1506-11
TERM | SOURCE | TERM TYPE | CODE | |
---|---|---|---|---|
Zea mays | MTH | PN | NOCODE | |
Zea mays | SNMI98 | PT | L-DB941 | |
Zea mays | CSP2000 | ET | 2340-8793 | |
ZEA MAYS | NDDF00 | IN | 006695 | |
Corn <1> | MTH | MM | U000077 | |
Corn | MSH2000 | MH | D003313 | |
Corn | SNMI98 | SY | L-DB941 | |
Corn | LCH90 | PT | U001161 | |
corn | AOD99 | DE | 0000013135 | |
corn | CSP2000 | PT | 2340-8793 | |
Indian Corn | MSH2000 | EP | D003313 | |
Corn, Indian | MSH2000 | PM | D003313 | |
Maize | MSH2000 | EP | D003313 | |
Maize | SNMI98 | SY | L-DB941 | |
maize | AOD99 | NP | 0000026523 | |
maize | CSP2000 | ET | 2340-8793 |
Material drawn from the 2000 UMLS Metathesaurus. The meaning of the Source Abbreviations and Term Types can be obtained by reviewing the UMLS documentation.
RELATIONSHIP | DEFINITION |
---|---|
Broader (RB) | Has a meaning which includes that of the concept. |
Narrower (RN) | Has a meaning which is included in that of the concept. |
Other related (RO) | Has a relationship other than synonymous, narrower, or broader. |
LIKE (RL) | The two concepts are similar or "alike". In the current edition of the Metathesaurus, most relationships with this attribute link MeSH supplementary concepts which are largely chemicals. Many of the concepts linked by this relationship may be synonymous and will be in a single concept identifier in future editions of the Metathesaurus. Source-specific mappings from one vocabulary to another also have this relationship, along with the label for the relationship attribute of "mapped_to.". |
Parent (PAR) | Is a parent in a hierarchy of a Metathesaurus source vocabulary |
Child (CHD) | Is a child in a hierarchy of a Metathesaurus source vocabulary |
Sibling (SIB) | Shares a parent in a hierarchy in a Metathesaurus source vocabulary. |
AQ | Is an allowed qualifier for a concept in a Metathesaurus source vocabulary. |
QB | Can be qualified by a concept in a Metathesaurus source vocabulary. |
Concept name bolded for readability.
Contexts drawn from 2000 UMLS Metathesaurus.
+ Indicates has children not shown
Last reviewed: 18 May 2006
Last updated: 18 May 2006
First published: 08 January 2001
Metadata| Permanence level: Permanent: Dynamic Content