Skip to Content
United States National Library of Medicine National Institutes of Health

Section 2
Metathesaurus

2.0 Introduction

The Metathesaurus is a very large, multi-purpose, and multi-lingual vocabulary database that contains information about biomedical and health related concepts, their various names, and the relationships among them. Designed for use by system developers, the Metathesaurus is built from the electronic versions of various thesauri, classifications, code sets, and lists of controlled terms used in patient care, health services billing, public health statistics, indexing and cataloging biomedical literature, and/or basic, clinical, and health services research. These are referred to as the "source vocabularies" of the Metathesaurus. The term Metathesaurus draws on Webster's Dictionary third definition for the prefix "meta," i.e., "more comprehensive, transcending." In a sense, the Metathesaurus transcends the specific thesauri, vocabularies, and classifications it encompasses.

The Metathesaurus is organized by concept or meaning. In essence, it links alternative names and views of the same concept and identifies useful relationships between different concepts.

The Metathesaurus is linked to other UMLS Knowledge Sources. All concepts in the Metathesaurus are assigned to at least one Semantic Type from the Semantic Network. This provides consistent categorization of all concepts in the Metathesaurus at the relatively general level represented in the Semantic Network. Many of the words and multi-word terms that appear in concept names or strings in the Metathesaurus also appear in the SPECIALIST Lexicon. The lexical tools are used to generate the word, normalized word, and normalized string indexes to the Metathesaurus. MetamorphoSys is the software tool for customizing the Metathesaurus for specific purposes.

MetamorphoSys is also the installation program for all of the UMLS resources. Users can obtain a DVD of the UMLS Knowledge Sources or download them from the UMLS Knowledge Source Server. To ensure proper functionality you should download and extract all UMLS data and zip files to the same directory.

2.0.1 Scope of the Metathesaurus

The scope of the Metathesaurus is determined by the combined scope of its source vocabularies. Many relationships (primarily synonymous), concept attributes, and some concept names are added by the NLM during Metathesaurus creation and maintenance, but essentially all the concepts themselves come from one or more of the source vocabularies. Generally, if a concept does not appear in any of the source vocabularies, it will also not appear in the Metathesaurus.

2.0.2 Preservation of Content and Meaning from Source Vocabularies

The Metathesaurus reflects and preserves the meanings, concept names, and relationships from its source vocabularies. When two different source vocabularies use the same name for differing concepts, the Metathesaurus represents both of the meanings and indicates which meaning is present in which source vocabulary. When the same concept appears in different hierarchical contexts in different source vocabularies, the Metathesaurus includes all the hierarchies. When conflicting relationships between two concepts appear in different source vocabularies, both views are included in the Metathesaurus. Although specific concept names or relationships from some source vocabularies may be idiosyncratic and lack face validity, they are still included in the Metathesaurus.

In other words, the Metathesaurus does not represent a comprehensive NLM-authored ontology of biomedicine or a single consistent view of the world (except at the high level of the semantic types assigned to all its concepts). The Metathesaurus preserves the many views of the world present in its source vocabularies because these different views may be useful for different tasks.

Although it preserves all the meanings and content in its source vocabularies, the Metathesaurus stores this information in a single common format. The native format of each vocabulary is carefully studied and then "inverted" into the common Metathesaurus format. For some vocabularies, this involves representing implied information in a more explicit format. For example, if a source vocabulary stores its preferred concept name as the first occurrence in a list of alternative concept names, that first name is explicitly tagged as the preferred name for that source in the Metathesaurus.

2.0.3 Need to Customize the Metathesaurus

Because it is a multi-purpose resource that includes concepts and terms from many different source vocabularies developed for very different purposes, the Metathesaurus must be customized for effective use in most specific applications. Your decisions about what to include in your customized subset(s) of the Metathesaurus will have a significant effect on its utility in your systems. Vocabulary sources that are essential for some purposes, e.g., LOINC for standard exchange of laboratory data, may be detrimental for others, such as Natural Language Processing (NLP). It can also be important to exclude a subset of the concept names found in a vocabulary source that is otherwise useful, e.g., non-standard abbreviations or shortened forms that lack face validity or produce spurious results in NLP.

The Metathesaurus contains source vocabularies produced by many different copyright holders. The majority of the content of the Metathesaurus is available for use under the basic (and quite open) terms described in Sections 1-11 and 13-16 of the the Metathesaurus license. However, some vocabulary producers place additional restrictions on the use of their content as distributed within the Metathesaurus. The various levels of additional restrictions are described in Section 12 of the license. The level that applies to individual vocabularies is recorded in the Appendix to the license in the Appendix B.4 to this documentation, and in the the MetamorphoSys installation and customization program. If you already have a separate license for use of one of the source vocabularies, your existing license also applies to that source as distributed within the Metathesaurus. In some cases, you may have to request permission or negotiate a separate license with a vocabulary producer in order to use that vocabulary in a production system. There may be a charge associated with these separate permissions or license agreements.

The Metathesaurus is designed to facilitate customization. All information in the Metathesaurus is labeled as to its source(s), so it is possible to determine which concept names, attributes, and relationships come from which source vocabularies and which attributes and relationships were added during Metathesaurus construction. The labels allow you to subset the Metathesaurus by excluding information from specific source vocabularies, including those for which you do not have necessary licenses or permissions. It is also easy to exclude all source vocabularies that have particular restriction levels or all information in particular languages. In addition to identifying the source(s), restriction levels, and language of the information it contains, the Metathesaurus includes various more specific concept name flags and relationship labels that can help you to exclude content that is not relevant or helpful for particular applications.

MetamorphoSys, the installation and customization program distributed with the UMLS, makes it easy to generate custom subsets. MetamorphoSys also includes default settings that generate subsets that may be generally useful. MetamorphoSys can also be used to change the default preferred names of concepts (explained in Section 2.2.6); to change the default character set (from 7-bit ASCII to Unicode UTF8); and to include versioned vocabulary source abbreviations in every Metathesaurus file (see Section 2.1)

MetamorphoSys also generates special subsets referred to as Content Views. A content view may specify any pre-defined subset of the Metathesaurus that is useful for some specific purpose. The actual definition of a content view can take a variety of different forms: (1) an actual list of Metathesaurus UIs maintained over time; (2) a list of sources that participate in the view; and (3) a complex query that identifies particular sets of data.

A Content View Flag (CVF) consists of an arbitrary bit field, with each bit representing membership in a particular Content View; each Content View is documented in MRDOC.RRF. The first Content View available in the 2005AA release, the MetaMap NLP View, identifies terms that are useful for Natural Language Processing. The CVF in rows with these terms carries the value "1" in the "256" bit. MetamorphoSys users who wish to use this special subset should choose File Menu, Enable/Disable Views to implement this feature.

2.0.4 Metathesaurus Release Formats

You may select from two relational formats: the Rich Release Format (RRF), introduced in 2004, and the Original Release Format (ORF). Both are available as output options of MetamorphoSys. All Rich Release Format file names have an extension (.RRF). Original Release Format files have no extension. Both formats are described in this documentation (usually abbreviated as RRF and ORF). There is also a white paper explaining the rationale for the Rich Release Format and a detailed description of the differences between the Rich Release Format files and the Original Format files.

The Rich Release Format has a number of advantages and is the preferred format for new users of the Metathesaurus and for most data creation applications.

2.1 Source Vocabularies

The Metathesaurus contains concepts, concept names, and other attributes from more than 100 terminologies, classifications, and thesauri, some in multiple editions. There is a concept in the Metathesaurus for each source vocabulary itself, which is assigned the Semantic Type "Intellectual Product". A special file (MRSAB.RRF and MRSAB in ORF) stores the version of each source vocabulary present in a particular edition of the Metathesaurus. All other Metathesaurus files that reference source vocabularies use "root" or versionless abbreviations, e.g., ICD9CM, not ICD9CM2003, thus avoiding routine wholesale updates to reflect the new versions. If you prefer versioned vocabulary source abbreviations in your custom Metathesaurus subset files, MetamorphoSys offers this option.

A complete list of the Metathesaurus source vocabularies with their root and versioned source abbreviations appears in Appendix B.4 of this documentation. The list is alphabetized by the abbreviation for that vocabulary source that is used in the Metathesaurus. Appendix B.4 includes other information: the number of its concept names that are present in the Metathesaurus, the type of hierarchies or contexts it has (if any), and whether it is one of the small number of source vocabularies that is not routinely updated in the Metathesaurus.

The Metathesaurus source vocabularies include terminologies designed for use in patient-record systems; large disease and procedure classifications used for statistical reporting and billing; more narrowly focused vocabularies used to record data related to psychiatry, nursing, medical devices, adverse drug reactions, etc.; disease and finding terminologies from expert diagnostic systems; and some thesauri used in information retrieval. A categorized list of the English-language source vocabularies is available.

2.1.1 Inclusion of U.S. Standard Code Sets and Terminologies

The Metathesaurus includes the code sets mandated for use in electronic administrative transactions in the U.S. under the provisions of the Health Insurance Portability and Accountability Act (HIPAA). With the exception of the National Drug Codes (NDC), the Metathesaurus includes all concepts and terms from these code sets. NDC codes available from the Food and Drug Administration are included as attributes of clinical drug concepts present in the FDA National Drug Code Directory (MTHFDA), which is a source vocabulary.

NLM intends to incorporate all clinical terminologies designated as target U.S. government-wide standards by the Consolidated Health Informatics (CHI) initiative and/or recommended as U.S. standards by the National Committee on Vital and Health Statistics. Several of these (e.g., LOINC, SNOMED CT, RxNorm) are already present in the Metathesaurus.

The fact that a vocabulary has been designated as a HIPAA or CHI standard is included in Appendix B.4.

2.1.2 Inclusion of Languages Other Than English

The Metathesaurus structure can accommodate translations of its source vocabularies into languages other than English. Many translations in many different languages are present in this edition of the Metathesaurus. The Metathesaurus includes many translations of some source vocabularies, e.g., NLM’s Medical Subject Headings (MeSH) and the International Classification of Primary Care; one or a few of others, and, in many cases, only the English version. As previously explained, MetamorphoSys makes it easy to create a subset of the Metathesaurus that excludes the languages that are not relevant in a particular application.

2.2 Concepts, Concept Names, and Their Identifiers

The Metathesaurus is organized by concept. One of its primary purposes is to connect different names for the same concept from many different vocabularies. The Metathesaurus assigns several types of unique, permanent identifiers to the concepts and concept names it contains, in addition to retaining all identifiers that are present in the source vocabularies. The Metathesaurus concept structure includes concept names, their identifiers, and key characteristics of these concept names (e.g., language, vocabulary source, name type). The entire concept structure appears in a single file in the Rich Release Format (MRCONSO.RRF). An abbreviated version of the concept structure is split between two files in the Original Format (MRCON and MRSO).

2.2.1 Concepts and Concept Identifiers

A concept is a meaning. A meaning can have many different names. A key goal of Metathesaurus construction is to understand the intended meaning of each name in each source vocabulary and to link all the names from all of the source vocabularies that mean the same thing (the synonyms). This is not an exact science. The construction of the Metathesaurus is based on the assumption that specially trained subject experts can determine synonymy with a high degree of accuracy. Metathesaurus editors decide what view of synonymy to represent in the Metathesaurus concept structure. Please note that each source vocabulary’s view of synonymy is also present in the Metathesaurus, irrespective of whether it agrees or disagrees with the Metathesaurus view.

Each concept or meaning in the Metathesaurus has a unique and permanent concept identifier (CUI). The CUI has no intrinsic meaning. In other words, you cannot infer anything about a concept just by looking at its CUI. In principle, the identifier for a concept never changes, irrespective of changes over time in the names that are attached to it in the Metathesaurus or in the source vocabularies.

A CUI will be removed from the Metathesaurus when it is discovered that two CUIs name the same concept – in other words, when undiscovered synonymy comes to light. In these cases, one of the two CUIs will be retained, all relevant information in the Metathesaurus will be linked to it, and the other CUI will be retired.

Retired CUIs are never re-used. Each edition of the Metathesaurus includes files that detail any such changes from the previous edition. One Metathesaurus file (MRCUI.RRF and MRCUI in ORF) tracks such changes from 1991 to the present, allowing you to check the fate of any CUI that is no longer present in the Metathesaurus.

2.2.2 Concept Names and String Identifiers

Each unique concept name or string in each language in the Metathesaurus has a unique and permanent string identifier (SUI). Any variation in character set, upper-lower case, or punctuation is a separate string, with a separate SUI. The same string in different languages (e.g., English and Spanish) will have a different string identifier for each language. If the same string, e.g., Cold, has more than one meaning, the string identifier will be linked to more than one concept identifier (CUI).

2.2.3 Atoms and Atom Identifiers

The basic building blocks or "atoms" from which the Metathesaurus is constructed are the concept names or strings from each of the source vocabularies. Every occurrence of a string in each source vocabulary is assigned a unique atom identifier (AUI). If exactly the same string appears twice in the same vocabulary, for example, as both the long name and the short name for the same concept or as an alternate name for two different concepts in the same vocabulary source, a unique AUI is assigned for each occurrence. When the same string appears in multiple source vocabularies, it will have AUIs for every time it appears as a concept name in each of those sources. All of these AUIs will be linked to a single string identifier (SUI), since they represent occurrences of the same string. Unlike string identifiers, a single AUI is always linked to a single concept identifier, because each occurrence of a string in a source can only have one meaning.

AUIs appear in the RRF (.RRF files), but not in the ORF.

2.2.4 Terms and Lexical Identifiers

For English language entries in the Metathesaurus only, each string is linked to all of its lexical variants or minor variations by means of a common term identifier (LUI). (In the Metathesaurus, therefore, an English "term" is the group of all strings that are lexical variants of each other.) English lexical variants are detected using the Lexical Variant Generator (LVG) program, one of the UMLS the lexical tools. As similar tools become available for other languages, they may be used to create lexical variant groups in other languages. (In the meantime, the LUI for a non-English string is really another string identifier.)

Like a string identifier, the LUI for an English string may be linked to more than one concept. This occurs when strings that are lexical variants of each other have different meanings. In contrast, each string identifier and each atom identifier can only be linked to a single LUI.

2.2.5 Uses of Concept, String, Atom, and Term Identifiers

In the Metathesaurus, every CUI (concept) is linked to at least one AUI (atom), SUI (string), and LUI (term), but can also be linked to many of each of these. Every AUI (atom) is linked to a single SUI (string), a single LUI (term), and a single CUI (concept). Each SUI (string) can be linked to many AUIs (atoms), to a single LUI (term), and to more than one CUI (concept) – although the typical case is one CUI. Each LUI (term) can be linked to many AUIs (atoms), many SUIs (strings), and more than one CUI (concept) – although the typical case is one CUI.

FIGURE 1.

Concept (CUI)

Terms (LUIs)

Strings (SUIs)

Atoms (AUIs)
* RRF Only

C0004238
Atrial Fibrillation
(preferred)
Atrial Fibrillations
Auricular Fibrillation
Auricular Fibrillations

L0004238
Atrial Fibrillation
(preferred)
Atrial Fibrillations

S0016668
Atrial Fibrillation
(preferred)

A0027665
Atrial Fibrillation
(from MSH)

A0027667
Atrial Fibrillation
(from PSY)

S0016669
Atrial Fibrillations

A0027668
Atrial Fibrillations
(from MSH)

L0004327
(synonym)
Auricular Fibrillation 
Auricular Fibrillations

S0016899
Auricular Fibrillation
(preferred)

A0027930
Auricular Fibrillation
(from PSY)

S0016900
(plural variant)
Auricular Fibrillations

A0027932
Auricular Fibrillations
(from MSH)

In the abbreviated example in Figure 1, Atrial Fibrillation appears as an atom in more than one source vocabulary and has a distinct AUI for each occurrence. Since each of these atoms has an identical string or concept name, they are linked to a single SUI. Atrial Fibrillations, the plural of Atrial Fibrillation, has a different string identifier. Since the singular and plural are lexical variants of each other, both are linked to the same LUI. There is a different LUI and different SUIs and AUIs for Auricular Fibrillation and its plural Auricular Fibrillations. Since Atrial Fibrillation and Auricular Fibrillation have been judged to have the same meaning, they are linked to the same CUI.

All of these identifiers serve important purposes in building the Metathesaurus, in allowing efficient and accurate customization for specific purposes, and in identifying changes in its concept and concept name coverage over time.

For example, CUIs link all information in the Metathesaurus related to particular concepts. In other words, a CUI can be used to retrieve all the concept names, relationships, and attributes for a particular concept that appear in any Metathesaurus file. CUIs also serve as permanent, publicly available identifiers for biomedical concepts or meanings to which many individual source vocabularies are linked. You are strongly encouraged to incorporate CUIs in your local applications – to support data exchange and linking and to assist migration between the use of individual source vocabularies should that become necessary in the future.

2.2.6 Default Preferred Names for Metathesaurus Concepts

As a convenience for those who build the Metathesaurus, one string from one English term is designated and labeled as the default preferred name of each concept in the Metathesaurus.To avoid laborious selection among alternative terms and strings, selection of the default preferred name for any Metathesaurus concept is based on an order of precedence of all the types of English strings in all the Metathesaurus source vocabularies. Different types of strings, e.g., preferred terms, cross references, and abbreviations from each vocabulary will have different positions in this order. The factors considered in establishing the default order of precedence include breadth of subject coverage, frequency of update, and the degree to which the source's concept names are used in regular clinical or biomedical discourse. The default order of precedence appears in MRRANK.RRF (MRRANK in ORF), and in the Appendix B, Section B.5 of this documentation.

The default order of precedence will not be suitable for all applications of the Metathesaurus. the MetamorphoSys can be used to change the selection of preferred names to feature terminology from the source vocabularies most appropriate to particular user populations. For example, concept names from SNOMED CT may be preferred in clinical applications, and terminology from MeSH may be preferred in literature retrieval systems.

2.2.7 Strings with Multiple Meanings

In some cases, the same name (with or without differences in upper-lower case) may apply to different concepts, usually (but not always) in different Metathesaurus source vocabularies. In the abbreviated example that follows, the string "Cold" is a name for the temperature in one vocabulary. In another vocabulary, "Cold" is an alternate name for the "Common cold". In a third vocabulary, "COLD" is an acronym for "chronic obstructive lung disease". As a result, "Cold" or "COLD" appears as a name of more than one concept in the Metathesaurus.

2.2.7.1 Representation of Ambiguity in the Metathesaurus

Separate Metathesaurus files (AMBIGLUI.RRF and AMBIGSUI.RRF (AMBIG.LUI and AMBIG.SUI in ORF)) contain the LUIs and SUIs of all ambiguous terms and strings known to the Metathesaurus.

FIGURE 2.

Concepts (CUIs)

Terms (LUIs)

Strings (SUIs)

Atoms (AUIs)
** RRF only

C0009264
cold temperature

L0215040
cold temperature

S0288775
cold temperature

A0318651
cold temperature
(from CSP)

L0009264
Cold
Cold

S0007170
Cold

A0016032
Cold
(from MTH)

S0026353
Cold

A0040712
Cold
(from MSH)

C0009443
Common Cold

L0009443
Common Cold

S0026747
Common Cold

A0041261
Common Cold
(from MSH)

L0009264
Cold
Cold

S0007171
Cold

A0016033
Cold
(from MTH)

S0026353
Cold

A0040708
Cold
(from COSTAR)

C0024117
Chronic Obstructive
Airway Disease

L0498186
Chronic Obstructive
Airway Disease

S0837575
Chronic Obstructive
Airway Disease

A0896021
Chronic Obstructive
Airway Disease
(from MSH)

L0008703
Chronic Obstructive
Lung Disease

S0837576
Chronic Obstructive
Lung Disease

A0896023
Chronic Obstructive
Lung Disease
(from MSH)

L0009264
COLD
COLD

S0829315
COLD

A0887858
COLD
(from MTH)

S0474508
COLD

A0539536
COLD
(from SNMI)

2.2.8 Concept Names Added During Metathesaurus Construction

Although the majority of concept names present in the Metathesaurus come from one or more of its source vocabularies, some concept names are created during Metathesaurus construction. This occurs in the following circumstances:

  1. A unique name is created for a string with multiple meanings (the case explained in Section 2.2.7)
  2. A more explicit name is created when none of the source vocabulary names for a concept conveys its meaning adequately
  3. An American English variant is generated for a British spelling
  4. An equivalent basic Latin ASCII character set string is generated for a string in an extended character set, such as Unicode

Like all other concept names in the Metathesaurus, names created during Metathesaurus construction are labeled to indicate their source.

2.3 Relationships and Relationship Identifiers

The Metathesaurus includes many relationships between different concepts (in addition to the synonymous relationships in the Metathesaurus concept structure described in Section 2.2). Most of these relationships come from individual source vocabularies. Some are added by NLM during Metathesaurus construction. Some have been contributed by Metathesaurus users to support certain types of applications.

Relationships are expressed in terms of CUIs (in the RRF and ORF) and AUIs (in the RRF only). Metathesaurus relationship files do not include concept names.

In general, the Metathesaurus indicates the author of each relationship, that is, one of the source vocabularies, the Metathesaurus itself, or another supplier. Some relationships added in the early years of Metathesaurus development (less than 6 percent of the current total and declining) are attributed to the Metathesaurus, but actually came from specific source vocabularies.

2.3.1 Basic Categories of Non-Synonymous Relationships

The Metathesaurus contains non-synonymous relationships between concepts from the same source vocabulary (intra-source vocabulary relationships) and between concepts in different vocabularies (inter-source vocabulary relationships). The Metathesaurus does not include all possible non-synonymous relationships between the concepts it contains. It includes all relationships present in its source vocabularies and some additional relationships designed to connect related concepts. In general, the relationships asserted by source vocabularies connect closely related concepts, such as those that share some common property or are related by definition. For example, a member of a class of drugs (e.g., penicillin) will be connected to the name for the class (e.g., antibiotics); a bacterial infection will be connected to the bacterium that causes it.

2.3.1.1 Intra-Source Relationships

The majority of intra-source relationships are asserted or implied by the individual source vocabularies. Such relationships occur in a source vocabulary’s explicit or implied hierarchical arrangements or contexts, cross-reference structures, rules for applying qualifiers, or connections between different types of names for the same concept (e.g., abbreviations and full forms). The primary Metathesaurus relationships file, that is, MRREL.RRF and MRREL in the ORF contains the "distance -1" hierarchical relationships, i.e., immediate parents, immediate child, and immediate sibling relationships, as well as other types of intra-source relationships.

A subset of the contextual or hierarchical relationships is also distributed in a special contexts file (MRCXT.RRF and MRCXT in ORF) to facilitate the construction of user displays. A "computable" representation of the complete hierarchies is provided in MRHIER.RRF only. MRHIER.RRF, for example, represents all sibling relationships even when there are thousands of siblings. Appendix B.4 indicates which source vocabularies have hierarchical contexts, which of these allow concepts to appear in multiple hierarchies, and whether sibling relationships are represented in MRCXT.RRF and MRCXT in ORF or only in MRHIER.RRF.

ORF users may omit MRCXT if they do not want these selected, pre-computed contexts.

Some of the intra-source vocabulary relationships are statistical relationships, which are computed by determining the frequency with which concepts in specific vocabularies co-occur in records in a database. For example, there are co-occurrence relationships for the number of times concepts have co-occurred as key topics within the same articles, as evidenced by the Medical Subject Headings assigned to those articles in the MEDLINE database. Co-occurrence relationships have also been computed for different ICD-9-CM diagnosis codes assigned to the same patients as reflected in a discharge summary database. In contrast to the relationships asserted within source vocabularies, the statistical relationships in the Metathesaurus can connect very different concepts, such as diseases and drugs. There are specific Metathesaurus files for the co-occurrence relationships (MRCOC.RRF and MRCOC in ORF).

2.3.1.2 Inter-Source Relationships

The primary inter-source relationships in the Metathesaurus are the synonymous relationships represented in the Metathesaurus concept structure. The Metathesaurus also includes some relationships between non-synonymous concepts from different source vocabularies. Some of these inter-source relationships are generated during Metathesaurus construction to connect specific "orphan" concepts (with few or no ancestors, siblings, or children in their own source vocabularies) to the richer contextual information in another source vocabulary. Some are supplied by Metathesaurus users who find "like" or "similar" relationships a useful addition to the Metathesaurus’s relatively strict view of synonymy. In both cases, these relationships are distributed in MRREL.RRF and MRREL in ORF.

Many inter-source relationships between non-synonymous concepts are produced through specific efforts to create a mapping between two different source vocabularies. These mappings may be created by an individual source vocabulary producer, by a third party with a particular need for a mapping, or by NLM or under NLM supervision specifically for distribution within the Metathesaurus. The number of NLM-supervised mappings is expected to increase. There are specific Metathesaurus files for mappings in the RRF (MRMAP.RRF and MRSMAP.RRF). A subset of the mappings appears in MRATX in the ORF. Mappings involving SNOMED CT appear in the RRF only.

2.3.2 Relationship Labels

All relationships (outside the basic concept structure) in the Metathesaurus carry a general label (REL), describing their basic nature, such as Broader, Narrower, Child of, Qualifier of, etc., and are identified by their source. Most of these relationships are either directly asserted in a source vocabulary or are implied by the structure of the source vocabulary. A complete list of the general relationship labels appears as MRDOC.RRF in Appendix B.3 in this documentation.

About a quarter of the relationships in the Metathesaurus also carry an additional label (RELA), obtained from a source vocabulary, that explains the nature of the relationship more exactly, such as is_a, branch_of, component_of. The Digital Anatomist vocabulary and RxNorm are examples of source vocabularies that include such relationship labels. A complete list of the additional relationship labels appears in MRDOC.RRF and in Appendix B.3 in this documentation.

2.3.3 Relationship Identifiers

Every relationship present in the Metathesaurus has a unique relationship identifier (RUI). The primary purpose of these identifiers is to enable easy detection of changes in relationships across versions of the Metathesaurus. The appearance or disappearance of a relationship identifier indicates a change in the relationships present in the Metathesaurus.

Some source vocabularies have their own relationship identifiers. Where they exist, these identifiers are also present in the Metathesaurus.

2.4 Attributes and Attribute Identifiers

In the Metathesaurus, attributes include every discrete piece of information about a concept, an atom, or a relationship that is not (1) part of the basic Metathesaurus concept structure or (2) distributed in one of the relationship files.

2.4.1 Kinds of Attributes

The Metathesaurus includes concept attributes, atom attributes, and relationship attributes.

Concept attributes are added during Metathesaurus construction and apply to all names of a concept. For example, the Semantic Types "Pathologic Function" and "Finding" are attributes of the concept with the preferred name "Atrial Fibrillation" and are applicable to any atom connected to that concept.

Atom attributes come from a particular source vocabulary. Some of them are of general interest; others are relevant only to a particular source vocabulary. For example, the definition "Disorder of cardiac rhythm characterized by rapid, irregular atrial impulses and ineffective atrial contractions" is an attribute of the atom Atrial Fibrillation that comes from the Medical Subject Headings (MeSH). It may be one of several definitions connected to names of this concept, because the Metathesaurus includes all definitions provided by any of its source vocabularies. Although this particular definition comes from MeSH, it might well be useful in Metathesaurus applications that otherwise do not use MeSH. In contrast, the date an occurrence of a string (an atom) was added to a source vocabulary applies only to that specific atom. The utility of specific atom attributes will vary considerably for different applications of the Metathesaurus.

Relationship attributes come from a particular source vocabulary and describe special characteristics of particular relationships in that source, e.g., refinability.

The majority of attributes are distributed in MRSAT.RRF and MRSAT in the ORF. In these files, each row contains the name of the attribute, the source of the attribute, and the value of the attribute, in addition to all appropriate identifiers. There are separate files for selected attributes such as the Semantic Types (MRSTY.RRF and MRSTY in the ORF) and the definitions (MRDEF.RRF and MRDEF in the ORF).

2.4.2 Attribute Identifiers

Each occurrence of each attribute within the Metathesaurus is assigned a unique attribute identifier (ATUI). The appearance or disappearance of ATUIs signals changes in the content of the Metathesaurus, thus ATUIs assist the efficient production of a complete change set for each new version of the Metathesaurus. ATUIs appear only in the RRF, not in the ORF.

2.5 Data About the Metathesaurus

The Metathesaurus contains a number of files that provide useful metadata, i.e., data about the Metathesaurus itself. The metadata files describe (1) characteristics of the current version of the Metathesaurus; (2) changes between the current version and the previous version; and (3) the history of concept identifiers (CUIs) from 1991 to the present.

2.5.1 Characteristics of the Current Metathesaurus

There are discrete Metathesaurus files for:

MRCOLS, MRDOC, MRSAB, and MRRANK contain data that do not appear in the actual Metathesaurus content files. The others are computable from the Metathesaurus content files. They are pre-computed and provided in separate files as a convenience to users.

2.5.2 Changes Between the Current Metathesaurus and the Previous Version

Each version of the Metathesaurus contains a set of files that summarize changes from the previous version.

CHANGE/MERGEDCUI.RRF in the RRF (CHANGE/MERGED.CUI in the ORF) documents cases in which two discrete concepts in the previous version of the Metathesaurus are now considered to be synonyms.

CHANGE/MERGEDLUI.RRF in the RRF (CHANGE/MERGED.LUI in the ORF) documents cases in which two discrete terms in the previous version of the Metathesaurus are now identified as lexical variants of each other, based on the current version of luinorm (the program used to compute them).

Three files contain the CUIs, LUIs, and SUIs for Metathesaurus concepts, terms, and strings that appeared in the previous version, but are not in the current version (CHANGE/DELETEDCUI.RRF, CHANGE/DELETEDLUI.RRF, CHANGE/DELETEDSUI.RRF in the RRF and CHANGE/DELETED.CUI, CHANGE/DELETED.LUI, CHANGE/DELETED.SUI in the ORF).

Note: Future versions of the Metathesaurus change files will provide for relationships and attributes in the RRF only. The generation of these files is dependent on the relationship and attribute identifiers (RUI and ATUI) introduced in the 2004AA version of the Metathesaurus.

2.5.3 Historical CUIs

The retired CUI file (MRCUI.RRF in RRF and MRCUI in ORF) includes all CUIs present in any previous version of the Metathesaurus, but not in the current version. In general, the file maps the retired CUI to one or more current CUIs.

2.6 Concept Name Indexes

2.6.0 Introduction

To assist system developers in building applications that retrieve all strings or concept names which include specific words or groups of words, three indexes to the concept names are provided: a Word Index, a Normalized Word Index (for English words only), and a Normalized String Index (for English strings only). The indexes are described in Sections 2.6.1, 2.6.2, and 2.6.3, respectively. To make the distinctions among them clearer, the examples include words or strings that would appear in each index for the following set of Metathesaurus concept names:

Lung Diseases, Obstructive

(C0024117, L0024117, S0058463)

Obstructive Lung Diseases

(C0024117, L0024117, S0068169)

Lung Disease, Obstructive

(C0024117, L0024117, S0058458)

Obstructive Lung Disease

(C0024117, L0024117, S0068168)


2.6.1 Word Index

2.6.1.1 Description

The word index connects each individual word in any Metathesaurus string to all its related string, term, and concept identifiers. There are separate word index files for each language in the Metathesaurus.

There is one entry for each word found in each unique string in each language. Each entry has five sub-elements.

  1. LAT - 3-letter abbreviation for language
  2. WD - Word
  3. CUI - concept unique identifier
  4. LUI - term unique identifier
  5. SUI - string unique identifier

Sample Records

ENG|000003|C1273274|L3139159|S3660797|
ENG|000003|C1306276|L3139160|S3660798|

2.6.1.2 Definition of a Word

In this index, a word is defined as a token containing only alphanumeric characters with length one or greater; for more information, see the SPECIALIST Lexicon and tools.

2.6.1.3 Word Index Example

For the four concept names listed in Section 2.6.0, the word index will contain multiple entries for each of the following words: disease, diseases, lung, obstructive. Two of the entries generated for the names Lung Disease, Obstructive and Obstructive Lung Disease are shown below:

ENG|disease|C0024117|L0024117|S0058458|
ENG|disease|C0024117|L0024117|S0068168|

2.6.2 Normalized Word Index

2.6.2.1 Description

The normalized word index connects each individual normalized English word to all its related string, term, and concept identifiers.

There is one entry for each normalized word found in each unique English string. There are no entries for other languages in this index. Each entry has five sub-elements.

  1. LAT - (always ENG in this edition of the Metathesaurus)
  2. NWD - normalized word
  3. CUI - concept unique identifier
  4. LUI - term unique identifier
  5. SUI - string unique identifier

2.6.2.2 Definition of Normalized Word

The normalization process involves breaking a string into its constituent words, lowercasing each word, and converting it to its uninflected form. Normalized words are generated by uninflecting each word and stripping out a small number of stop words. The uninflected forms are generated using the SPECIALIST Lexicon if the words appear in the lexicon; otherwise they are generated algorithmically.

2.6.2.3 Normalized Word Example

For the four concept names listed in Section 2.6.0, the normalized word index will contain multiple entries for each of the following words: disease, lung, obstructive. Since the normalized word index contains base forms only, it does not contain entries for the plural "diseases". In this index, therefore, all four concept names are linked to the normalized word "disease", as follows:

ENG|disease|C0024117|L0024117|S0058458|
ENG|disease|C0024117|L0024117|S0058463|
ENG|disease|C0024117|L0024117|S0068168|
ENG|disease|C0024117|L0024117|S0068169|

2.6.3 Normalized String Index

2.6.3.1 Description

The normalized string index connects the normalized form of a Metathesaurus string to all its related string, term, and concept identifiers. There is one entry for each unique (non-normalized) English string. There are no entries for other languages in this index. Each entry has five sub-elements.

  1. LAT - (always ENG in this edition of the Metathesaurus)
  2. NSTR - normalized string
  3. CUI - concept unique identifier
  4. LUI - term unique identifier
  5. SUI - string unique identifier

2.6.3.2 Definition of Normalized String

The normalization process involves breaking a string into its constituent words, lowercasing each word, converting each word to its uninflected form, and sorting the words in alphabetic order. Normalized strings are generated by uninflecting each word, leaving out a small number of stop words. The uninflected forms are generated using the SPECIALIST Lexicon if the words appear in the lexicon; otherwise they are generated algorithmically.

2.6.3.3 Normalized String Example

Since the four concept names listed in Section 2.6.0 are composed of the same set of normalized words, the Normalized String Index will contain four entries for a single string: disease lung obstructive, in which the component normalized words appear in alphabetical order. The complete set of Normalized String Index entries generated by the four concept names is as follows:

ENG|disease lung obstructive|C0024117|L0024117|S0058458|
ENG|disease lung obstructive|C0024117|L0024117|S0058463|
ENG|disease lung obstructive|C0024117|L0024115|S0068168|
ENG|disease lung obstructive|C0024117|L0024117|S0068169|

2.6.4 Word Index Programs

The programs that generate these indexes are written in Java. They may be of use to system developers who are developing their own interfaces to the UMLS data or for other purposes. Section 4 includes information about these and other lexical programs provided with the UMLS Knowledge Sources.

2.7 File Formats - Metathesaurus Rich Release Format (RRF) and Original Release Format (ORF)

2.7.0 Introduction

Metathesaurus users may select from two relational formats: the Rich Release Format (RRF), first introduced in 2004, and the Original Release Format (ORF). Both are available as output options of MetamorphoSys, the installation and customization program.

Developers are encouraged to use the RRF, which offers significant advantages in source vocabulary transparency (that is, ability to exactly represent the detailed semantics of each source vocabulary); in the ability to generate complete and accurate change sets between versions of the Metathesaurus; and in more convenient representations of concept name, source, and hierarchical context information. A more complete discussion of the rationale for the RRF and a detailed description of the differences between the two formats are available.

Neither Metathesaurus format is fully normalized. By design, there is duplication of data among different files and within certain files. In particular, relationships between different Metathesaurus concepts appear twice (e.g., from entry A to entry B and from entry B to entry A). Developers will need to make their own decisions about the extent to which this redundancy should be retained, reduced, or increased for their specfic applications.

Section 2.7.1 describes the files in the RRF.

Section 2.7.2 describes the files in the ORF.

2.7.1 Metathesaurus Rich Release Format (RRF)

All file names begin with the letters MR (Metathesaurus Relational) and are followed by letters that denote the file contents (e.g., MRREL=relationships, MRSAB=source abbreviations), and then a file extension .RRF.

All files except MRRANK.RRF are sorted by row.

2.7.1.1 Data Files

The data in each Metathesaurus entry may be represented in more than 20 different relations, or files. These files correspond to the four logical groups of data elements described in Sections 2.2 - 2.5 and the indexes described in Section 2.6 as follows:

2.7.1.2 Columns and Rows

Each file or named table of data values has by definition a fixed number of columns; the number of rows depends on the content of a particular version of the Metathesaurus.

A column is a sequence of all the values in a given data element or logical sub-element. In general, columns for longer variable length data elements will appear to the right of columns for shorter and/or fixed length data elements. The information for all columns in the files is described in MRCOLS.RRF and in Appendix B.1.1, Metathesaurus Column Descriptions.

A row contains the values for one or more data elements or logical sub-elements for one Metathesaurus entry. Depending on the nature of the data elements involved, each Metathesaurus entry may have one or more rows in a given file. The values for the different data elements or logical sub-elements represented in the row are separated by vertical bars (|). If an optional element is blank, the vertical bars are still used to maintain the correct positioning of the subsequent elements. Each row is terminated by a vertical bar and line termination.

2.7.1.3 Descriptions of Each File

The descriptions of the files appear in the following order:

  1. Key data about the Metathesaurus: Files; Columns or Data Elements; documentation that explains the meaning of abbreviations that appear as values in Metathesaurus data elements and attributes
  2. Concept names and their vocabulary sources
  3. Attributes
  4. Relationships
  5. Other data about the Metathesaurus
  6. Indexes

Each file description lists the columns or data elements that appear in the file and includes sample rows from the file.

2.7.1.3.1 Files (File = MRFILES.RRF)

There is exactly one row in this file for each physical segment of each logical file. Data elements that appear in multiple files, e.g., CUI, AUI, will have multiple rows in this file.


Col.

Description


FIL

Physical FILENAME


DES

Descriptive Name


FMT

Comma separated list of column names (COL), in order


CLS

# of COLUMNS


RWS

# of ROWS


BTS

Size in bytes in this format (ISO/PC or Unix)

Sample Records

MRCOC.RRF|Co-occurringConcepts|CUI1,AUI1,CUI2,AUI2,SAB,COT,COF,COA,CVF|9|13939548|786509996|
MRSTY.RRF|Semantic Types|CUI,TUI,STN,STY,ATUI,CVF|6|1146352|64528811|

2.7.1.3.2 Data Elements (File = MRCOLS.RRF)

There is exactly one row in this file for each column or data element in each file. Data elements that appear in multiple files, e.g., CUI, AUI, will have multiple rows in this file.


Col.

Description


COL

Column or data element name


DES

Descriptive Name


REF

Documentation Section Number


MIN

Minimum Length, Characters


AV

Average Length


MAX

Maximum Length, Characters


FIL

Physical FILENAME in which this field occurs


DTY

SQL-92 data type for this column

Sample Records

AUI|Unique identifier for atom||8|8.00|8|MRCONSO.RRF|char(8)|
CODE|Unique Identifier or code for string in source||1|6.4|21|MRCONSO.RRF|varchar(50)|

2.7.1.3.3 Documentation for Abbreviated Values (File = MRDOC.RRF)

There is exactly one row in this table for each allowed value of selected data elements or attributes that have a finite number of abbreviations as allowed values. Examples of such data elements include TTY, ATN, TS, STT, REL, RELA.


Col.

Description


DOCKEY

Data element or attribute


VALUE

Abbreviation that is one of its values


TYPE

Type of information in EXPL column


EXPL

Explanation of VALUE

Sample Records

ATN|DDF|expanded_form|Drug Doseform|
ATN|DHJC|expanded_form|HCPCS J-code|

*Note: The MRDOC file produced by MetamorphoSys contains metadata about the release itself. Here is an example of the records:

RELEASE|mmsys.build.date|release_info|2006_01_31_17_30_34|
RELEASE|mmsys.version|release_info|7.7|

2.7.1.3.4 Concept Names and Sources (File = MRCONSO.RRF)

There is exactly one row in this file for each atom (each occurrence of each unique string or concept name within each source vocabulary) in the Metathesaurus, i.e., there is exactly one row for each unique AUI in the Metathesaurus. Every string or concept name in the Metathesaurus appears in this file, connected to its language, source vocabularies, and its concept identifier. The values of TS, STT, and ISPREF reflect the default order of precedence of vocabulary sources and term types in MRRANK.RRF.


Col.

Description


CUI

Unique identifier for concept


LAT

Language of term


TS

Term status


LUI

Unique identifier for term


STT

String type


SUI

Unique identifier for string


ISPREF

Atom status - preferred (Y) or not (N) for this string within this concept


AUI

Unique identifier for atom - variable length field, 8 or 9 characters


SAUI

Source asserted atom identifier [optional]


SCUI

Source asserted concept identifier [optional]


SDUI

Source asserted descriptor identifier [optional]


SAB

Abbreviated source name (SAB).  Maximum field length is 20 alphanumeric characters.  Two source abbreviations are assigned: 

  • Root Source Abbreviation (RSAB) — short form, no version information, for example, AI/RHEUM, 1993, has an RSAB of "AIR"
  • Versioned Source Abbreviation (VSAB) — includes version information, for example, AI/RHEUM, 1993, has an VSAB of "AIR93"

Official source names, RSABs, and VSABs are included in Appendix B.4.


TTY

Abbreviation for term type in source vocabulary, for example PN (Metathesaurus Preferred Name) or CD (Clinical Drug). Possible values are listed in Appendix B.3.


CODE

Most useful source asserted identifier (if the source vocabulary has more than one identifier), or a Metathesaurus-generated source entry identifier (if the source vocabulary has none)


STR

String


SRL

Source restriction level


SUPPRESS

Suppressible flag. Values = O, E, Y, or N

O: All obsolete content, whether they are obsolesced by the source or by NLM.
These will include all atoms having obsolete TTYs, and other atoms becoming obsolete that have not acquired an obsolete TTY (e.g. RxNorm SCDs no longer associated with current drugs, LNC atoms derived from obsolete LNC concepts).

E: Non-obsolete content marked suppressible by an editor. These do not have a suppressible SAB/TTY combination.

Y: Non-obsolete content deemed suppressible during inversion. These can be determined by a specific SAB/TTY combination explicitly listed in MRRANK.

N: None of the above

Default suppressibility as determined by NLM (i.e., no changes at the Suppressibility tab in MetamorphoSys) should be used by most users, but may not be suitable in some specialized applications. See the MetamorphoSys documentation (Section 6) for information on how to change the SAB/TTY suppressibility to suit your requirements. NLM strongly recommends that users not alter editor-assigned suppressibility, and MetamorphoSys cannot be used for this purpose.

CVF

Content View Flag. Bit field used to flag rows included in Content View. This field is a varchar field to maximize the number of bits available for use.

Sample Records

C0001175|ENG|P|L0001175|VO|S0010340|Y|A0019182||M0000245|D000163|MSH|PM|D000163|
     Acquired Immunodeficiency Syndromes|0|N|| 
C0001175|ENG|S|L0001842|PF|S0011877|N|A2878223|103840012|62479008||SNOMEDCT|PT|62479008|AIDS|4|N||
C0001175|ENG|P|L0001175|VC|S0354232|Y|A2922342|103845019|62479008||SNOMEDCT|SY|62479008|
     Acquired immunodeficiency syndrome|4|Y||
C0001175|FRE|P|L0162173|PF|S0226654|Y|A0248753||||INS|MH|d000163|SIDA|3|N||
C0001175|RUS|P|L0904943|PF|S1108760|Y|A1165232||||RUS|MH|D000163|SPID|3|N||

2.7.1.3.5 Simple Concept and Atom Attributes (File = MRSAT.RRF)

There is exactly one row in this table for each concept, atom, or relationship attribute that does not have a sub-element structure. All Metathesaurus concepts and a minority of Metathesaurus relationships have entries in this file. This file includes all source vocabulary attributes that do not fit into other categories.


Col.

Description


CUI

Unique identifier for concept (if METAUI is a relationship identifier, this will be CUI1 for that relationship)


LUI

Unique identifier for term (optional - present for atom attributes, but not for relationship attributes)


SUI

Unique identifier for string (optional - present for atom attributes, but not for relationship attributes)


METAUI

Metathesaurus atom identifier (will have a leading A) or Metathesaurus relationship identifier (will have a leading R) or blank if it is a concept attribute.


STYPE

The name of the column in MRCONSO.RRF or MRREL.RRF that contains the identifier to which the attribute is attached, e.g., SAUI, SCUI, SRUI, CODE, CUI, AUI. As vocabularies that were added to the Metathesaurus prior to the development of the RRF are updated and brought into complete alignment with the RRF, many attributes currently shown as linked to Metathesaurus AUIs will be linked to one of the source vocabulary identifiers.


CODE

Most useful source asserted identifier (if the source vocabulary contains more than one) or a Metathesaurus-generated source entry identifier (if the source vocabulary has none). Optional - present if METAUI is an AUI.


ATUI

Unique identifier for attribute


SATUI

Source asserted attribute identifier (optional - present if it exists)


ATN

Attribute name. Possible values appear in MRDOC.RRF and are described in Appendix B.2.


SAB

Abbreviated source name (SAB).  Maximum field length is 20 alphanumeric characters.  Two source abbreviations are assigned: 

  • Root Source Abbreviation (RSAB) — short form, no version information, for example, AI/RHEUM, 1993, has an RSAB of "AIR"
  • Versioned Source Abbreviation (VSAB) — includes version information, for example, AI/RHEUM, 1993, has an VSAB of "AIR93"

Official source names, RSABs, and VSABs are included in Appendix B.4.


ATV

Attribute value described under specific attribute name in Appendix B.2. A few attribute values exceed 1,000 characters. Many of the abbreviations used in attribute values are explained in MRDOC.RRF and included in Appendix B.3.


SUPPRESS

Suppressible flag. Values = O, E, Y, or N. Reflects the suppressible status of the attribute; not yet in use. See also SUPPRESS in MRCONSO.RRF and MRDEF.RRFand MRREL.RRF.


CVF

Content View Flag. Bit field used to flag rows included in Content View. This field is a varchar field to maximize the number of bits available for use.

Sample Records

C0001175|L0001175|S0010339|A0019180|AUI|D000163|AT15797077||FX|MSH|AIDS Dementia Complex|N||
C0001175|L0001175|S0354232|A2922342|SAUI|62479008|AT34794876||DESCRIPTIONSTATUS|SNOMEDCT|0|N||
C0001175|L2810384|S3645548|A3814219|SCUI|62479008|AT33494582||CTV3ID|SNOMEDCT|XE0RX|N||
C0001175|L2810384|S3645548|A3814219|SCUI|62479008|AT33652930|\ISPRIMITIVE|SNOMEDCT|0|N||
C0001175|||R19334287|SRUI||AT37098279||REFINABILITY|SNOMEDCT|1|N||

2.7.1.3.6 Definitions (File = MRDEF.RRF)

There is exactly one row in this file for each definition in the Metathesaurus. A definition is an attribute of an atom (an occurrence of a string in a source vocabulary). A few approach 3,000 characters in length.


Col.

Description


CUI

Unique identifier for concept


AUI

Unique identifier for atom - variable length field, 8 or 9 characters


ATUI

Unique identifier for attribute


SATUI

Source asserted attribute identifier [optional-present if it exists]


SAB

Abbreviated source name (SAB) of the source of the definition
Maximum field length is 20 alphanumeric characters.  Two source abbreviations are assigned: 

  • Root Source Abbreviation (RSAB) — short form, no version information, for example, AI/RHEUM, 1993, has an RSAB of "AIR"
  • Versioned Source Abbreviation (VSAB) — includes version information, for example, AI/RHEUM, 1993, has an VSAB of "AIR93"

Official source names, RSABs, and VSABs are included in Appendix B.4.


DEF

Definition


SUPPRESS

Suppressible flag. Values = O, E, Y, or N. Reflects the suppressible status of the attribute; not yet in use. See also SUPPRESS in MRCONSO.RRF and MRDEF.RRFand MRREL.RRF.


CVF

Content View Flag. Bit field used to flag rows included in Content View. This field is a varchar field to maximize the number of bits available for use.

Sample Records

C0001175|A0019180|AT15060425||MSH|An acquired defect of cellular immunity associated with infection by the human immunodeficiency virus (HIV), a CD4-positive T-lymphocyte count under 200 cells/microliter or less than 14% of total lymphocytes, and increased susceptibility to opportunistic infections and malignant neoplasms. Clinical manifestations also include emaciation (wasting) and dementia. These elements reflect criteria for AIDS as defined by the CDC in 1993.|N||

C0001175|A0021048|AT14042185||CSP|one or more indicator diseases, depending on laboratory evidence of HIV infection (CDC); late phase of HIV infection characterized by marked suppression of immune function resulting in opportunistic infections, neoplasms, and other systemic symptoms (NIAID).|N||

C0001175|A0021055|AT18420297||PDQ|Acquired immunodeficiency syndrome. An acquired defect in immune system function caused by human immunodeficiency virus 1 (HIV-1). AIDS is associated with increased susceptibility to certain cancers and to opportunistic infections, which are infections that occur rarely except in individuals with weak immune systems.|N||

2.7.1.3.7 Semantic Types (File = MRSTY.RRF)

There is exactly one row in this file for each Semantic Type assigned to each concept. All Metathesaurus concepts have at least one entry in this file. Many have more than one entry. The TUI, STN, and STY are all direct links to the UMLS Semantic Network (Section 3).


Col.

Description


CUI

Unique identifier of concept

TUI

Unique identifier of Semantic Type


STN

Semantic Type tree number


STY

Semantic Type. The valid values are defined in the Semantic Network.


ATUI

Unique identifier for attribute


CVF

Content View Flag. Bit field used to flag rows included in Content View. This field is a varchar field to maximize the number of bits available for use.

Sample Record

C0001175|T047|B2.2.1.2.1|Disease or Syndrome|AT17683839||


2.7.1.3.8 History (File = MRHIST.RRF)

This file tracks source-asserted history information. It currently includes SNOMED CT history only.


Col.

Description


CUI

Unique identifier for concept


SOURCEUI

Source asserted unique identifier


SAB

Abbreviated source name (SAB).  Maximum field length is 20 alphanumeric characters.  Two source abbreviations are assigned: 

  • Root Source Abbreviation (RSAB) — short form, no version information, for example, AI/RHEUM, 1993, has an RSAB of "AIR"
  • Versioned Source Abbreviation (VSAB) — includes version information, for example, AI/RHEUM, 1993, has an VSAB of "AIR93"

Official source names, RSABs, and VSABs are included in Appendix B.4.


SVER

Release date or version number of a source


CHANGETYPE

Source asserted code for type of change


CHANGEKEY

CONCEPTSTATUS (if history relates to a SNOMED CT concept) or DESCRIPTIONSTATUS (if history relates to a SNOMED CT atom)


CHANGEVAL

CONCEPTSTATUS value or DESCRIPTIONSTATUS value after the change took place. Note: The change may have affected something other than the status value.


REASON

Explanation of change if present


CVF

Content View Flag. Bit field used to flag rows included in Content View. This field is a varchar field to maximize the number of bits available for use.

Sample Records

C0000294|108821000|SNOMEDCT|20001101|0|CONCEPTSTATUS|0|||
C0000294|108821000|SNOMEDCT|20020731|2|CONCEPTSTATUS|0|FULLYSPECIFIEDNAME CHANGE||
C0000294|1185494016|SNOMEDCT|20020731|0|DESCRIPTIONSTATUS|0|||
C0000294|1461100014|SNOMEDCT|20030131|0|DESCRIPTIONSTATUS|0|||

2.7.1.3.9 Related Concepts (File = MRREL.RRF)

There is one row in this table for each relationship between concepts or atoms known to the Metathesaurus, with the following exceptions found in other files: co-occurrences found in MRCOC.RRF, and pair-wise mapping relationships between two source vocabularies found in MRMAP.RRF and MRSMAP.RRF.

Note that for asymmetrical relationships there is one row for each direction of the relationship. Note also the direction of REL - the relationship which the SECOND concept or atom (with Concept Unique Identifier CUI2 and Atom Unique Identifier AUI2) HAS TO the FIRST concept or atom (with Concept Unique Identifier CUI1 and Atom Unique Identifier AUI1).


Col.

Description


CUI1

Unique identifier of first concept


AUI1

Unique identifier of first atom


STYPE1

The name of the column in MRCONSO.RRF that contains the identifier used for the first concept or first atom in source of the relationship.


REL

Relationship of second concept or atom to first concept or atom


CUI2

Unique identifier of second concept


AUI2

Unique identifier of second atom


STYPE2

The name of the column in MRCONSO.RRF that contains the identifier used for the second concept or second atom in the source of the relationship.


RELA

Additional (more specific) relationship label (optional)


RUI

Unique identifier of relationship


SRUI

Source asserted relationship identifier, if present


SAB

Abbreviated source name of the source of relationship. Maximum field length is 20 alphanumeric characters.  Two source abbreviations are assigned: 

  • Root Source Abbreviation (RSAB) — short form, no version information, for example, AI/RHEUM, 1993, has an RSAB of "AIR"
  • Versioned Source Abbreviation (VSAB) — includes version information, for example, AI/RHEUM, 1993, has an VSAB of "AIR93"

Official source names, RSABs, and VSABs are included in Appendix B.4.


SL

Source of relationship labels

  RG Relationship group. Used to indicate that a set of relationships should be looked at in conjunction.

DIR

Source asserted directionality flag. Y indicates that this is the direction of the relationship in its source; N indicates that it is not; a blank indicates that it is not important or has not yet been determined.


SUPPRESS

Suppressible flag. Values = O, Y, E, or N. Reflects the suppressible status of the relationship; not yet in use. See also SUPPRESS in MRCONSO.RRF and MRDEF.RRF and MRREL.RRF.


CVF

Content View Flag. Bit field used to flag rows included in Content View. This field is a varchar field to maximize the number of bits available for use.

Sample Records

C0002372|A0022284|AUI|RB|C0002371|A0022279|AUI||R01983351||MSH|MSH|||N||
C0002372|A0022284|AUI|SY|C0002372|A0062352|AUI||R18851331||MSH|MSH|||N||

2.7.1.3.10 Co-occurring Concepts (File = MRCOC.RRF)

This file includes statistical aggregations of co-occurrences of meanings in external data sources. These exist at the AUI level. There are two rows in this table for each pair of atoms that co-occur in each information source represented: one for each direction of the relationship. (Note that the COA data may be different for each direction of the relationship.) Many Metathesaurus concepts have no entries in this file. Due to the very large number of co-occurrence relationships, they are distributed in a separate file.


Col.

Description


CUI1

Unique identifier of first concept


AUI1

Unique identifier of first atom


CUI2

Unique identifier of second concept or not present
Note: Where CUI2 is not present and COT is LQ (MeSh topical qualifier), the count of citations of CUI1 with no MeSH qualifiers is reported in COF.


AUI2

Unique identifier of second atom


SAB

Abbreviation of the source of co-occurrence information


COT

Type of co-occurrence


COF

Frequency of co-occurrence, if applicable


COA

Attributes of co-occurrence, if applicable


CVF

Content View Flag. Bit field used to flag rows included in Content View. This field is a varchar field to maximize the number of bits available for use.

Co-occurrences are concepts that occur together in the same entries in some information source. The relationships represented here are obtained from machine-manipulation of the information source. Co-occurrence relationships may exist between similar concepts (e.g., Atrial Fibrillation and Arrhythmia) or between very different concepts that nevertheless have some important connection in the field of biomedicine (e.g., Atrial Fibrillation and Digoxin), or between a primary concept and a qualifier (e.g., Lithotripsy and instrumentation). A co-occurrence relationship can exist between two concepts that have no other apparent relationship, although the frequency of such co-occurrences will be small.

In the current Metathesaurus, there are three sources of co-occurrence data: MEDLINE, AI/RHEUM, and CCPSS. From MEDLINE, co-occurrence data was computed for concepts that were designated as principal or main points in the same journal article i.e., the co-occurrence counts do not include articles in which either or both of the concepts were present and indexed in MEDLINE but not designated as main points. (A concept is considered to be a main point if the * is attached to the main heading or any of its subheadings.)

Two overall frequencies of MEDLINE co-occurrence are provided: one for recent MEDLINE data (MED) and one for MEDLINE data from a preceding block of years (MBD); see SOC for date ranges in the current edition. Separate counts are provided for the frequencies with which the first concept was qualified by different MeSH qualifiers or by no qualifier at all when it co-occurred with the second concept. There are separate entries for each direction of the co-occurrence relationship. The related subheading occurrence information in each entry belongs to the first concept in the entry and is therefore different for each direction of the relationship.

In addition to the specific qualifier information associated with two co-occurring concepts, this element also includes in entries with LQ and LQB values for type of co-occurrence, totals for the number of times each main concept was qualified by a specific subheading or by no subheading.

The AI/RHEUM co-occurrence data represent the co-occurrence of diseases and findings in the AI/RHEUM knowledge base, i.e., the diseases that co-occur with a particular finding and the findings that co-occur with a particular disease. Each disease/finding pair can co-occur only once in the AI/RHEUM knowledge base.

In CCPSS, the co-occurrence data is extracted from patient records and includes problem-problem co-occurrences within a patient record as well as problem-modifier co-occurrences.

Sample Records

C0000294|A0085139|C0002421|A0022413|MBD|L|1|TU=1||
C0000294|A0085139|C0003968|A0026910|MED|L|1|UR=1||
C0000294|A0085139|C0006463|A0033415|MBD|L|1|<>=1||

2.7.1.3.11 Computable Hierarchies (File = MRHIER.RRF)

This file contains one row for each hierarchy or context in which each atom appears. If a source vocabulary does not contain hierarchies, its atoms will have no rows in this file. If a source vocabulary is multi-hierarchical (allows the same atom to appear in more than one hierarchy), some of its atoms will have more than one row in this file. MRHIER.RRF provides a complete and compact representation of all hierarchies present in all Metathesaurus source vocabularies. Hierarchical displays can be computed by combining data in this file with data in MRCONSO.RRF. The distance-1 relationships, i.e., immediate parent, immediate child, and sibling relationships, represented in MRHIER.RRF also appear in MRREL.RRF.


Col.

Description


CUI

Unique identifier of concept


AUI

Unique identifier of atom - variable length field, 8 or 9 characters


CXN

Context number (e.g., 1, 2, 3)


PAUI

Unique identifier of atom's immediate parent within this context


SAB

Abbreviated source name (SAB) of the source of atom (and therefore of hierarchical context). Maximum field length is 20 alphanumeric characters.  Two source abbreviations are assigned: 

  • Root Source Abbreviation (RSAB) — short form, no version information, for example, AI/RHEUM, 1993, has an RSAB of "AIR"
  • Versioned Source Abbreviation (VSAB) — includes version information, for example, AI/RHEUM, 1993, has an VSAB of "AIR93"

Official source names, RSABs, and VSABs are included in Appendix B.4.


RELA

Relationship of atom to its immediate parent


PTR

Path to the top or root of the hierarchical context from this atom, represented as a list of AUIs, separated by periods (.) The first one in the list is top of the hierarchy; the last one in the list is the immediate parent of the atom, which also appears as the value of PAUI.


HCD

Source asserted hierarchical number or code for this atom in this context; this field is only populated when it is different from the code (unique identifier or code for the string in that source).


CVF

Content View Flag. Bit field used to flag rows included in Content View. This field is a varchar field to maximize the number of bits available for use.

Sample Records

C0001175|A2878223|1|A3316611|SNOMEDCT|isa|A3684559.A2880798.A339606.A3287869.A3316611|||
C0001175|A2878223|2|A3512124|SNOMEDCT|isa|A3684559.A2880798.A3398606.A3287869.A3512124|||
C0001175|A2878223|3|A3696836|SNOMEDCT|isa|A3684559.A2880798.A3398606.A3399957.A3399109.A3144217.A3696836|||
C0001175|A2878223|4|A3512124|SNOMEDCT|isa|A3684559.A2880798.A3398606.A3399957.A3399109.A3512124|||
C0001175|A2878223|5|A3316611|SNOMEDCT|isa|A3684559.A2880798.A3512117.A3082701.A3316611|||
C0001175|A2878223|6|A2888699|SNOMEDCT|isa|A3684559.A2880798.A3512117.A3082701.A3398847.A3398762.A2888699|||
C0001175|A2878223|7|A3316611|SNOMEDCT|isa|A3684559.A2880798.A3512117.A3287869.A3316611|||
C0001175|A2878223|8|A3512124|SNOMEDCT|isa|A3684559.A2880798.A3512117.A3287869.A3512124|||
C0001175|A2988194|1|A2888699|SNOMEDCT|isa|A3684559.A2880798.A3512117.A3082701.A3398847.A3398762.A2888699|||

To find the specific concept names used in a hierarchy, look up the atom identifiers in the AUI and STR data elements in MRCONSO.RRF.

For most source vocabularies, the value of RELA (if present) applies up the hierarchy to the top or root. In other words, it also applies to the relationship between the atom's parent and the atom's grandparent, etc. The two exceptions in this version of the Metathesaurus are GO (Gene Ontology) and NIC (Nursing Intervention Classification). Except for GO and NIC atoms, the MRHIER rows for an atom's ancestors (parent, grandparent, etc.) contain no added information except the source-asserted hierarchical number or code (HCD). If this is not of interest, there may be no reason to find MRHIER rows for an atom's ancestors.

To find an atom's siblings in a specific context, find all MRHIER.RRF rows that share its SAB, RELA*, and PTR values.

To find an atom's children in a specific context, append a period (.) and the atom's AUI to its PTR and find all MRHIER.RRF rows with its SAB, RELA*, and the expanded PTR.

*The RELA is needed to retrieve correct siblings and children for University of Washington Digital Anatomist (UWDA) hierarchies. Some UWDA atoms appear in multiple hierarchies that are distinguished ONLY by their RELA values.

2.7.1.3.12 Contexts (File = MRCXT.RRF)

This file is no longer created by default. It has been replaced by MRHIER.RRF which is a correct, complete, and computable representation of hierarchies.  Users who require the MRCXT file will need to create that file after creating a subset. To create the MRCXT file use the new MRCXT Builder application, accessible from the MetamorphoSys Welcome screen. Information on the MRCXT Builder can be found at http://www.nlm.nih.gov/research/umls/mrcxt_help.html. The information below describes the content of the file when produced by the MRCXT Builder.

This very large file contains pre-computed hierarchical context information (including concept names) intended to facilitate the display of hierarchies present in UMLS source vocabularies. All of the information in this file (plus additional sibling relationships) can be computed by joining the MRHIER.RRF file with MRCONSO.RRF. There can be many rows in this file for each occurrence of an atom in a hierarchy in any of the UMLS source vocabularies - a "context in" this discussion. Many Metathesaurus concepts have many atoms with contexts while others may have none. The number of rows per context differs depending on the number of ancestor, sibling, or child terms an atom has in that context. Because some atoms have multiple contexts in the same source, e.g., MeSH, a context number (CXN - e.g., 1, 2, 3) is used to identify all members of the same context. The CXNs are not global but are created as required for each atom. Each distinct context for a single atom can be retrieved with a CUI-AUI-SAB-CXN key. The "distance-1 relationships" i.e., the immediate parent, immediate child, and sibling relationships represented in MRCXT.RRF are also present in the MRREL.RRF file.


Col.

Description


CUI

Unique identifier of concept


SUI

Unique identifier of string used in this context


AUI

Unique identifier of atom that has this context (variable length field, 8 or 9 characters)


SAB

Abbreviated source name (SAB).  Maximum field length is 20 alphanumeric characters.  Two source abbreviations are assigned: 

  • Root Source Abbreviation (RSAB) — short form, no version information, for example, AI/RHEUM, 1993, has an RSAB of "AIR"
  • Versioned Source Abbreviation (VSAB) — includes version information, for example, AI/RHEUM, 1993, has an VSAB of "AIR93"

Official source names, RSABs, and VSABs are included in Appendix B.4.


CODE

Unique identifier or code for string in that source


CXN

The context number (if the atom has multiple contexts)


CXL

Context member label, i.e., ANC for ancestor of this atom, CCP for the atom itself, SIB for sibling of this atom, CHD for child of this atom


RNK

For rows with a CXL value of ANC, the rank of the ancestors (e.g., a value of 1 denotes the most remote ancestor in the hierarchy)


CXS

String or concept name for context member


CUI2

Concept identifier of context member (may be empty if context member is not yet in the Metathesaurus)


AUI2

Atom identifier of context member


HCD

Source hierarchical number or code of context member (if present)


RELA

Additional relationship label providing further categorization of the CXL, if applicable and known. Valid values listed in Appendix B.3.


XC

A plus (+) sign indicates that the CUI2 for this row has children in this context. If this field is empty, the CUI2 does not have children in this context.


CVF

Content View Flag. Bit field used to flag rows included in Content View.

Sample Records

C0001175|S1911299|A1855909|ICPC2P|B9001|1|ANC|1|ICPC2-Plus|C1140253|A1861145|||||
C0001175|S1911299|A1855909|ICPC2P|B90001|1|ANC|2|BLOOD/BLOOD FORMING ORGANS/IMMUNE
      MECHANISM|C0847039|A1852564|B||||
C0001175|S1911299|A1855909|ICPC2P|B90001|1|ANC|2|Diagnosis/Diseases Component|C0497531|A0916974|7||||  C0001175|S1911299|A1855909|ICPC2P|B90001|1|ANC|3|HIV-INFECTION|AIDS|C0497169|A1852069|B90|||| C0001175|S1911299|A1855909|ICPC2P|B90001|1|CCP||Acquired Immune-Deficiency Syndrome|C0001175|A1855909|B90001||||

2.7.1.3.13 Mappings (File = MRMAP.RRF)

This file contains sets of mappings between vocabularies. Most mappings are between codes/identifiers (or expressions formed by codes/identifiers) from two different vocabularies. At least one of the vocabularies in each set of mappings is present in the Metathesaurus; usually both of them are. The version of a vocabulary that appears in a set of mappings may be different from the version of that vocabulary that appears in the other Metathesaurus release files. The versions of the vocabularies in a map set are specified by the FROMVSAB and TOVSAB attributes of the map set concept (see below). Users should be aware that the mappings are only valid between the versions of the vocabularies specified in these attributes. The version of the map set itself is specified by the MAPSETVERSION attribute of the map set concept.

The MRMAP.RRF file is complex, to allow for more complicated mappings. Where possible, all mappings are also represented in the simpler MRSMAP.RRF file described below.

Each set of mappings is represented by a map set concept in MRCONSO.RRF (with TTY = ‘XM’) identified by a CUI (MAPSETCUI). Metadata of a map set are found in MRSAT.RRF as attributes of the map set concept. Each map set has three SAB values associated with it: the SAB of the map set itself (MAPSETVSAB), the SAB of the source being mapped (FROMVSAB) and the SAB of the source being mapped to (TOVSAB).  Thus, a single map set asserts mappings from only one source to only one other source.

A subset of the mappings is redundantly represented as mapped_to and mapped_from relationships in MRREL.RRF. These are one-to-one mappings between two vocabularies which are both present in the UMLS. These general relationships are not as precise as the mapping files, since any differences between versions of the vocabularies in the map set and the versions of those vocabularies in the rest of the Metathesaurus files are ignored.  Such differences may affect the validity of the relationships in MRREL.RRF in a small number of cases.

There are three map sets that contain mappings from Metathesaurus concepts (represented by CUIs) to expressions formed by one or more concept names. These were formerly called associated expressions, and all have MAPTYPE='ATX'. This data is derived from earlier mapping efforts and is represented in the MRATX file in ORF.

 

Col.

Description

 

MAPSETCUI

Unique identifier for the UMLS concept which represents the whole map set.

 

MAPSETSAB

Source abbreviation (SAB) for the provider of the map set.

 

MAPSUBSETID

Map subset identifier used to identify a subset of related mappings within a map set. This is used for cases where the FROMEXPR may have more than one potential mapping (optional).

 

MAPRANK

Order in which mappings in a subset should be applied. Used only where MAPSUBSETID is used. (optional)

 

MAPID

Unique identifier for this individual mapping. Primary key of this table to identify a particular row.

 

MAPSID

Source asserted identifier for this mapping (optional).

 

FROMID

Identifier for the entity being mapped from. This is an internal UMLS identifier used to point to an external entity in a source vocabulary (represented by the FROMEXPR). When the source provides such an identifier, it is reused here. Otherwise, it is generated by NLM. The FROMID is only unique within a map set. It is not a pointer to UMLS entities like atoms or concepts. There is a one-to-one correlation between FROMID and a unique set of values in FROMSID, FROMEXPR, FROMTYPE, FROMRULE, and FROMRES within a map set.

 

FROMSID

Source asserted identifier for the entity being mapped from (optional).

 

FROMEXPR

Entity being mapped from - can be a single code/identifier /concept name or a complex expression involving multiple codes/identifiers/concept names, Boolean operators and/or punctuation

 

FROMTYPE

Type of entity being mapped from.

 

FROMRULE

Machine processable rule applicable to the entity being mapped from (optional)

 

FROMRES

Restriction applicable to the entity being mapped from (optional).

 

REL

Relationship of the entity being mapped from to the entity being mapped to.

 

RELA

Additional relationship label (optional).

 

TOID

Identifier for the entity being mapped to. This is an internal identifier used to point to an external entity in a source vocabulary (represented by the TOEXPR). When the source provides such an identifier, it is reused here. Otherwise, it is generated by NLM. The TOID is only unique within a map set. It is not a pointer to UMLS entities like atoms or concepts. There is a one-to-one correlation between TOID and a unique set of values in TOSID, TOEXPR, TOTYPE, TORULE, TORES within a map set.

 

TOSID

Source asserted identifier for the entity being mapped to (optional).

 

TOEXPR

Entity being mapped to - can be a single code/identifier/concept name or a complex expression involving multiple codes/identifiers/concept names, Boolean operators and/or punctuation.

 

TOTYPE

Type of entity being mapped to.

 

TORULE

Machine processable rule applicable to the entity being mapped to (optional).

 

TORES

Restriction applicable to the entity being mapped to (optional).

 

MAPRULE

Machine processable rule applicable to this mapping (optional).

 

MAPRES

Restriction applicable to this mapping (optional).

 

MAPTYPE

Type of mapping (optional).

 

MAPATN

The name of the attribute associated with this mapping [not yet in use]

 

MAPATV

The value of the attribute associated with this mapping [not yet in use]

 

CVF

The Content View Flag is a bit field used to indicate membership in a content view.

Sample Records:
Map set concepts (in MRCONSO.RRF):

|C1306694|ENG|P|L3139022|PF|S3660621|Y|A3829740||||MTH|XM|1000|MSH Associated Expressions|0|N|0|
|C1321851|ENG|P|L3502396|PF|S4036398|Y|A4363175|100046|||SNOMEDCT|XM|100046|SNOMEDCT mappings to ICD-9-CM|4|N|0|

Map set metadata (in MRSAT.RRF):

|C1321851|L3502396|S4036398|A4363175|CODE|100046|AT58994529||FROMVSAB|MTH|SNOMEDCT_2006_01_31|0|
|C1321851|L3502396|S4036398|A4363175|CODE|100046|AT58994530||MAPSETVSAB|MTH|SNOMEDCT_2006_01_31|N|0|
|C1321851|L3502396|S4036398|A4363175|CODE|100046|AT58994531||TOVSAB|MTH|ICD9CM_2006|N|0|

Mappings (in MRMAP.RRF):

|C1306694|MTH||0|AT28307527||C0011764||C0011764|CUI|||RO||2201||<Developmental Disabilities> AND <Writing>|BOOLEAN_EXPRESSION|||||ATX|||0|
|C1306694|MTH||0|AT52620421||C0010700||C0010700|CUI|||RN||1552||<Bladder>/<surgery>|BOOLEAN_EXPRESSION|||||ATX|||0|
|C1321851|SNOMEDCT|0|0|AT31496368||68995007|68995007|68995007|SCUI|||RQ|mapped_to|4751059|4751059|295.22|CODE|||||1|||0|
|C1321851|SNOMEDCT|0|0|AT31496369||404049001|404049001|404049001|SCUI|||RN|mapped_to|4388059|4388059|215.9|CODE|||||2|||0|

2.7.1.3.14 Simple Mappings (File = MRSMAP.RRF)

This file provides a simpler representation of most of the mappings in MRMAP.RRF to serve applications which do not require the full richness of the MRMAP.RRF data structure. Generally, mappings that support rule-based processing need the additional fields of MRMAP.RRF (e.g. MAPRANK, MAPRULE, MAPRES) and will not be represented in MRSMAP.RRF. More specifically, all mappings with non-null values for MAPSUBSETID and MAPRANK are excluded from MRSMAP.RRF.

 

Col.

Description

 

MAPSETCUI

Unique identifier for the UMLS concept which represents the whole map set.

 

MAPSETSAB

Source abbreviation for the map set.

 

MAPID

Unique identifier for this individual mapping. Primary key of this table to identify a particular row.

 

MAPSID

Source asserted identifier for this mapping (optional).

 

FROMEXPR

Entity being mapped from - can be a single code/identifier/concept name or a complex expression involving multiple codes/identifiers/concept names, Boolean operators and/or punctuation.

 

FROMTYPE

Type of entity being mapped from.

 

REL

Relationship of the entity being mapped from to the entity being mapped to.

 

RELA

Additional relationship label (optional).

 

TOEXPR

Entity being mapped to - can be a single code/identifier /concept name or a complex expression involving multiple codes/identifiers/concept names, Boolean operators and/or punctuation.

 

TOTYPE

Type of entity being mapped to.

 

CVF

The Content View Flag is a bit field used to indicate membership in a content view.

Sample Records

|C1306694|MTH|AT28312030||C0009215|CUI|SY||<Codeine> AND <Drug Hypersensitivity>|BOOLEAN_EXPRESSION||
|C1306694|MTH|AT28312033||C0795964|CUI|RU||<Speech Disorders>|BOOLEAN_EXPRESSION||
|C1321851|SNOMEDCT|AT31496368||68995007|SCUI|RQ|mapped_to|295.22|CODE|0|
|C1321851|SNOMEDCT|AT31496369||404049001|SCUI|RN|mapped_to|215.9|CODE|0|

2.7.1.3.15 Source Information (File = MRSAB.RRF)

The Metathesaurus has "versionless" or "root" Source Abbreviations (SABs) in the data files. MRSAB.RRF connects the root SAB to fully specified version information for the current release. For example, the released SAB for MeSH is now simply "MSH". In MRSAB.RRF, you will see a current versioned SAB, e.g., MSH2003_2002_10_24. MRSAB.RRF allows all other Metathesaurus files to use versionless source abbreviations, so that all rows with no data change between versions remain unchanged. MetamorphoSys can produce files with either the root or versioned SABs so that either form can be available in custom subsets of the Metathesaurus.

There is one row in this file for every version of every source in the current Metathesaurus; eventually there will also be historical information with a row for each version of each source that has appeared in any Metathesaurus release. Note that the field CURVER has the value Y to identify the version in this Metathesaurus release. Future releases of MRSAB.RRF will also contain historical version information in rows with CURVER value N.

The structure of MRSAB.RRF is as follows:

Field
Full Name
Description
VCUI
CUI
CUI of the versioned SRC concept for a source
RCUI
Root CUI
CUI of the root SRC concept for a source
VSAB
Versioned Source Abbreviation
The versioned source abbreviation for a source, e.g., MSH2003_2002_10_24
RSAB
Root Source Abbreviation
The root source abbreviation for a source e.g., MSH
SON
Official Name
The official name for a source
SF
Source Family
The source family for a source
SVER
Version
The source version, e.g., 2001
VSTART
Meta Start Date
The date a source became active, e.g., 2001_04_03
VEND
Meta End Date
The date a source ceased to be active, e.g., 2001_05_10
IMETA
Meta Insert Version
The version of the Metathesaurus in which a source first appeared, e.g., 2001AB
RMETA
Meta Remove Version
The version of the Metathesaurus in which the source last appeared, e.g., 2001AC
SLC
Source License Contact
The source license contact information
SCC
Source Content Contact
The source content contact information
SRL
Source Restriction Level
0, 1, 2, 3, 4 - explained in the License Agreement
TFR
Term Frequency
The number of terms for this source in MRCONSO.RRF,  e.g., 12343
CFR
CUI Frequency
The number of CUIs associated with this source,  e.g., 10234
CXTY
Context Type
The type of contexts for this source. Values are FULL, FULL-MULTIPLE, FULL-NOSIB, FULL-NOSIB-MULTIPLE, FULL-MULTIPLE-NOSIB-RELA, null.
TTYL
Term Type List
Term type list from source, e.g., MH, EN, PM, TQ
ATNL
Attribute Name List
The attribute name list (from MRSAT.RRF), e.g., MUI, RN, TH
LAT
Language
The language of the terms in the source
CENC
Character Encoding
All UMLS content is provided in Unicode, encoded in UTF-8.

MetamorphoSys will allow exclusion of extended characters with some loss of information. Transliteration to other character encodings is possible but not supported buy NLM; for further information, see http://www.unicode.org.
CURVER
Current Version
A Y or N flag indicating whether or not this row corresponds to the current version of the named source
SABIN
Source in Subset
A Y or N flag indicating whether or not this row is represented in the current MetamorphoSys subset.  Initially always Y where CURVER is Y, but later is recomputed by MetamorphoSys.
SSN
Source Short Name
The short name of a source as used by the NLM Knowledge Source Server
SCIT
Source Citation
Citation information for a source.  This is intended to replace the SOS attributes in the SRC concepts.

Sources with contexts have "full" contexts, i.e., all levels of terms may have Ancestors, Parents, Children and Siblings. A full context may also be further designated as Multiple, Nosib (No siblings) or both Multiple and Nosib.

Multiple indicates that a single concept in this source may have multiple hierarchical positions.

No siblings (Nosib) indicates that siblings have not been computed for this source.

Appendix B.4, Source Vocabularies, lists each source in the Metathesaurus and includes information about the type of context, if any, for each source.

Sample Records

C1371270|C1140284|RXNORM_04AB|RXNORM|RXNORM Project, META2004AB | RXNORM | 04AB | 2004_05_17 | | 2004AB | | Stuart Nelson, M.D., Head, MeSH Section; e-mail: nelson@nlm.nih.gov | Stuart Nelson, M.D., Head, MeSH Section; e-mail: nelson@nlm.nih.gov | 0 | 138005 | 110403 | | BN,IN,OBD,OCD,SBD,SBDF,SCD,SCDC,SCDF,SY | ORIG_CODE,ORIG_SOURCE | ENG | UTF-8 | Y | Y | RxNorm work done by the National Library of Medicine|RxNorm work done by NLM. Bethesda (MD): National Library of Medicine, META2004AB release. | |

2.7.1.3.16 Concept Name Ranking (File = MRRANK.RRF)

There is exactly one row for each concept name type from each Metathesaurus source vocabulary (each SAB-TTY combination). The RANK and SUPPRESS values in the distributed file are those used in Metathesaurus production. Users are free to change these values to suit their needs and preferences, then change the naming precedence and suppressibility by using MetamorphoSys to create a customized Metathesaurus.


Col.

Description


RANK

Numeric order of precedence, higher value wins


SAB

Abbreviated source name (SAB) for source vocabulary.  Maximum field length is 20 alphanumeric characters.  Two source abbreviations are assigned: 

  • Root Source Abbreviation (RSAB) — short form, no version information, for example, AI/RHEUM, 1993, has an RSAB of "AIR"
  • Versioned Source Abbreviation (VSAB) — includes version information, for example, AI/RHEUM, 1993, has an VSAB of "AIR93"

Official source names, RSABs, and VSABs are included in Appendix B.4.


TTY

Abbreviation for term type in source vocabulary, for example PN (Metathesaurus Preferred Name) or CD (Clinical Drug). Possible values are listed in Appendix B.3.


SUPPRESS

NLM-recommended Source and Term Type (SAB/TTY) Suppressiblity. Values = Y or N. Indicates the suppressible status of all atoms (names) with this Source and Term Type (SAB/TTY). Note that changes made in MetamorphoSys at the Suppressible tab are recorded in your configuration file. Status E does not occur here, as it is assigned only to individual cases such as the names (atoms) in MRCONSO.RRF. See also SUPPRESS in MRCONSO.RRF, MRDEF.RRF, and MRREL.RRF.

Sample Records

0210|AIR|SY|N|
0209|ULT|PT|N|
0208|CPT|PT|N|

2.7.1.3.17 Ambiguous Term Identifiers (File = AMBIGLUI.RRF)

In the instance that a Lexical Unique Identifier (LUI) is linked to multiple Concept Unique Identifiers (CUIs), there is one row in this table for each LUI-CUIs pair. This file identifies those lexical variant classes which have multiple meanings in the Metathesaurus.

In the Metathesaurus, the LUI links all strings within the English language that are identified as lexical variants of each other by the luinorm program found in the UMLS SPECIALIST Lexicon and tools. LUIs are assigned irrespective of the meaning of each string. This table may be useful to system developers who wish to use the lexical programs in their applications to identify and disambiguate ambiguous terms.


Col.

Description


LUI

Lexical Unique Identifier


CUI

Concept Unique Identifier

Sample Records

L0000003 |C0010504|
L0000003 |C0917995|
L0000032 |C0010206|
L0000032 |C0010207|

2.7.1.3.18 Ambiguous String Identifiers (File = AMBIGSUI.RRF)

In the instance that a String Unique Identifier (SUI) is linked to multiple Concept Unique Identifiers (CUIs), there is one row in this table for each SUI-CUIs pair.

This file resides in the META directory. In the Metathesaurus, there is only one SUI for each unique string within each language, even if the string has multiple meanings. This table is only of interest to system developers who use the SUI in their applications or in local data files.


Col.

Description


SUI

String Unique Identifier


CUI

Concept Unique Identifier

Sample Records

S0000176 |C0042266|
S0000176 |C0546846|
S0000217 |C0024817|
S0000217 |C0555026|


2.7.1.3.19 Metathesaurus Change Files 

There are six files or relations that identify key differences between entries in the previous and the current edition of the Metathesaurus. Developers can use these special files to determine whether there have been changes that affect their applications.

The usefulness of individual files will depend on how data from the Metathesaurus have been linked or incorporated in a particular application.

Each relation or named table of data has a fixed number of columns and variable number of rows. A column is a sequence of all the values in a given data element. A row contains the values for two or more data elements for one entry. The values for the different data elements in the row are separated by vertical bars (|). Each row ends with a vertical bar and line termination.

2.7.1.3.19.1 Deleted Concepts (File = CHANGE/DELETEDCUI.RRF)

Concepts whose meaning is no longer present in the Metathesaurus are reported in this file. There is a row for each concept that existed in the previous release and is not present in the current release. If the meaning exists in the current release, i.e., the missing concept was merged with another current concept, it is reported in the MERGEDCUI.RRF file (section 2.7.1.3.19.2) and not in this file.


Col.

Description


PCUI

Concept Unique Identifier in the previous Metathesaurus


PSTR

Preferred name of this concept in the previous Metathesaurus

2.7.1.3.19.2 Merged Concepts (File = CHANGE/MERGEDCUI.RRF)

There is exactly one row in this table for each released concept in the previous Metathesaurus (CUI1) that was merged into another released concept from the previous Metathesaurus (CUI2). When this merge occurs, the first CUI (CUI1) was retired; this table shows the CUI (CUI2) for the merged concept in this Metathesaurus.

Entries in this file represent concepts pairs that were considered to have different meanings in the previous edition, but which are now identified as synonyms.


Col.

Description


PCUI1

Concept Unique Identifier in the previous Metathesaurus


CUI

Concept Unique Identifier in this Metathesaurus in format C#######

2.7.1.3.19.3 Deleted Terms (File=CHANGE/DELETEDLUI.RRF)

There is exactly one row in this table for each Lexical Unique Identifier (LUI) that appeared in the previous Metathesaurus, but does not appear in this Metathesaurus.

LUIs are assigned by the luinorm program, part of the LVG program in the UMLS SPECIALIST Lexicon and tools; see Section 4.

These entries represent the cases where LUIs identified by the previous release's luinorm program, when used to identify lexical variants in the previous Metathesaurus, are no longer found with this release's luinorm on this release's Metathesaurus. This does not necessarily imply the deletion of a string or a concept from the Metathesaurus.


Col.

Description


PLUI

Lexical Unique Identifier in the previous Metathesaurus


PSTR

Preferred Name of Term in the previous Metathesaurus

2.7.1.3.19.4 Merged Terms (File = CHANGE/MERGEDLUI.RRF)

There is exactly one row in this file for each case in which strings had different Lexical Unique Identifiers (LUIs) in the previous Metathesaurus yet share the same LUI in this Metathesaurus; a LUI present in the previous Metathesaurus is therefore absent from this Metathesaurus.

LUIs are assigned by the luinorm program, part of the LVG program in the UMLS SPECIALIST Lexicon and tools; see Section 4.

These entries represent the cases where separate lexical variants as identified by the previous release's luinorm program version are a single lexical variant as identified by this release's luinorm.


Col.

Description


PLUI

Lexical Unique Identifier in the previous Metathesaurus but not present in this Metathesaurus


LUI

Lexical Unique Identifier into which it was merged in this Metathesaurus

2.7.1.3.19.5 Deleted Strings (File = CHANGE/DELETEDSUI.RRF)

There is exactly one row in this file for each string in each language that was present in an entry in the previous Metathesaurus and does not appear in this Metathesaurus.

Note that this does not necessarily imply the deletion of a term (LUI) or a concept (CUI) from the Metathesaurus. A string deleted in one language may still appear in the Metathesaurus in another language.


Col.

Description


PSUI

String Unique Identifier in the previous Metathesaurus that is not present in this Metathesaurus


PSTR

Preferred Name of Term in the previous Metathesaurus that is not present in this Metathesaurus

2.7.1.3.19.6 Retired CUI Mapping (File = MRCUI.RRF)

There are one or more rows in this file for each Concept Unique Identifier (CUI) that existed in any prior release but is not present in the current release. The file includes mappings to current CUIs as synonymous or to one or more related current CUI where possible. If a synonymous mapping cannot be found, other relationships between the CUIs can be created. These relationships can be Broader (RB), Narrower (RN), Other Related (RO), Deleted (DEL) or Removed from Subset (SUBX). Rows with the SUBX relationship are added to MRCUI by MetamorphoSys for each CUI that met the exclusion criteria and was consequently removed from the subset. Some CUIs may be mapped to more than one other CUI using these relationships.

CUIs may be retired when (1) two released concepts are found to be synonyms and so are merged, retiring one CUI; (2) the concept no longer appears in any source vocabulary and is not 'rescued' by NLM; or (3) the concept is an acknowledged error in a source vocabulary or determined to be a Metathesaurus production error.

See Sections 2.7.1.3.19, 1 through 5 for files containing changes from the last release only, without mappings.


Col.

Description


CUI1

Unique identifier for first concept - Retired CUI - was present in some prior release, but is currently missing


VER

The last release version in which CUI1 was a valid CUI


REL
Relationship

RELA
Relationship attribute

MAPREASON
Reason for mapping

CUI2

Unique identifier for second concept - the current CUI that CUI1 most closely maps to


MAPIN
Is this map in current subset? Values of Y, N, or null. MetamorphoSys generates the Y or N to indicate whether the CUI2 concept is or is not present in the subset. The null value is for rows where the CUI1 was not present to begin with (i.e., REL=DEL).

Sample Records

C1313903|2004AA|SY|||C0525045|Y|
C1313909|2004AA|RO|||C0476661|Y|
C1321833|2004AA|DEL|||||
C1382264|2004AB|SY|||C0993613|Y|
C1382494|2004AB|DEL|||||

2.7.1.3.19.7 AUI Movements (File = MRAUI.RRF)

This file records the movement of Atom Unique Identifiers (AUIs) from a concept (CUI1) in one version of the Metathesaurus to a concept (CUI2) in the next version (VER) of the Metathesaurus. The file is historical.


Col.

Description


AUI1 Atom unique identifier

CUI1 Concept unique identifier

VER version in which this change to the AUI first occurred
  REL Relationship

RELA Relationship attribute
MAPREASON Reason for mapping
  AUI2 Unique identifier for second atom

CUI2 Unique identifier for second concept - the current CUI that CUI1 most closely maps to

MAPIN Mapping in current subset: is AUI2 in current subset? Values of Y, N, or null.

Sample Records

A0000039|C0236824|2004AC|||move|A0000039|C1411876|Y|
A0000077|C0003477|2005AB|||move|A0000077|C1510447|Y|
A8177040|C1237728|2005AB|||move|A8177040|C1237732|Y|

2.7.1.3.20 Word Index (File = MRXW_BAQ.RRF, MRXW_DAN.RRF, MRXW_DUT.RRF, MRXW_ENG.RRF, MRXW_FIN.RRF, MRXW_FRE.RRF, MRXW_GER.RRF, MRXW_HEB.RRF, MRXW_HUN.RRF, MRXW_ITA.RRF, MRXW_NOR.RRF, MRXW_POR.RRF, MRXW_RUS.RRF, MRXW_SPA.RRF, MRXW_SWE.RRF)

There is one row in these tables for each word found in each unique Metathesaurus string (ignoring upper-lower case). All Metathesaurus entries have entries in the word index. The entries are sorted in ASCII order.


Col.

Description


LAT

Abbreviation of language of the string in which the word appears


WD

Word in lowercase


CUI

Concept identifier


LUI

Term identifier


SUI

String identifier

Sample Records from MRXW_ENG.RRF

ENG|anaemia|C0002871|L0280031|S0352688|
ENG|anemia|C0002871|L0002871|S0013742|
ENG|anemias|C0002871|L0002871|S0013787|
ENG|blood|C0002871|L0376533|S0500659|

Sample Records from MRXW_FRE.RRF

FRE|ANEMIE|C0002871|L0162748|S0227229|

2.7.1.3.21 Normalized Word Index (File = MRXNW_ENG.RRF)

There is one row in this table for each normalized word found in each unique English-language Metathesaurus string. All English-language Metathesaurus entries have entries in the normalized word index. There are no normalized string indexes for other languages in this edition of the Metathesaurus.


Col.

Description


LAT

Abbreviation of language of the string in which the word appears (always ENG in this edition of the Metathesaurus)


NWD

Normalized word in lowercase (described in Section 2.6.2.1)


CUI

Concept identifier


LUI

Term identifier


SUI

String identifier

Sample Records

ENG|anaemia|C0002871|L0280031|S0352688|
ENG|anemia|C0002871|L0002871|S0013742|
ENG|anemia|C0002871|L0002871|S0013787|
ENG|blood|C0002871|L0376533|S0500659|

2.7.1.3.22 Normalized String Index (File = MRXNS_ENG.RRF)

There is one row in this table for each normalized string found in each unique English-language Metathesaurus string (ignoring upper-lower case). All English-language Metathesaurus entries have entries in the normalized string index. There are no normalized word indexes for other languages in this edition of the Metathesaurus.


Col.

Description


LAT

Abbreviation of language of the string (always ENG in this edition of the Metathesaurus)


NSTR

Normalized string in lowercase (described in Section 2.6.3.1)


CUI

Concept identifier


LUI

Term identifier


SUI

String identifier

Sample Records

ENG|anaemia|C0002871|L0280031|S0352688|
ENG|anaemia unspecified|C0002871|L0696700|S0803315|
ENG|anemia|C0002871|L0002871|S0013787|

2.7.2 Metathesaurus Original Release Format (ORF)

Note: The preferred and more complete format is described above in Section 2.7.1, the Metathesaurus Rich Release Format (RRF).

All files except MRRANK are sorted by row.

2.7.2.1. Data Files

The data in each Metathesaurus entry may be represented in more than 20 different "relations" or files. These files correspond to the four logical groups of data elements described in Section 2.2 - 2.5 and the indexes described in Section 2.6 as follows:

The AMBIG* files now provide a convenient way to identify all Metathesaurus terms and strings that have more than one meaning in Metathesaurus source vocabularies.

2.7.2.2 Columns and Rows

Each relation or named table of data values has by definition a fixed number of columns; the number of rows depends on the content of a particular version of the Metathesaurus.

A column is a sequence of all the values in a given data element or logical sub-element. In general, columns for longer variable length data elements will appear to the right of columns for shorter and/or fixed length data elements. The information for all columns in the ORF files is described in Appendix B.1.2, ORF Columns or Data Elements.

A row contains the values for one or more data elements or logical sub-elements for one Metathesaurus entry. Depending on the nature of the data elements involved, each Metathesaurus entry may have one or more rows in a given file. The values for the different data elements or logical sub-elements represented in the row are separated by vertical bars (|). If an optional element is blank, the vertical bars are still used to maintain the correct positioning of the subsequent elements. Each row is terminated by a vertical bar and line termination.

2.7.2.3 Descriptions of Each File

The descriptions of the files appear in the following order:

2.7.2.3.1 Files (File = MRFILES)

There is exactly one row in this file for each physical segment of the files in the relational format. The columns or data elements in the file are as follows:  


Col.

Description


FIL

Physical FILENAME


DES

Descriptive name


FMT

Comma separated list of COL, in order


CLS

# of COLUMNS


RWS

# of ROWS


BTS

Size in bytes in this format (ISO/PC or Unix)

Sample Records

MRATX|Associated Expressions|CUI,SAB,REL,ATX|4|7295|442571|
MRCOC|Co-occurring Concepts|CUI1,CUI2,SAB,COT,COF,COA|6|9061980|343331578|
MRCOLS|Attribute Relation|COL,DES,REF,MIN,AV,MAX,FIL, DTY|8|115|5728|

2.7.2.3.2 Data Elements (File = MRCOLS)

There is exactly one row in this file for each column or data element in each file in the relational format.


Col.

Description


COL

Column or data element name


DES

Descriptive name


REF

Documentation section number


MIN

Minimum length, characters


AV

Average length


MAX

Maximum length, characters


FIL

Physical FILENAME in which this field occurs


DTY

SQL-92 data type for this column

Sample Records

ATN|Attribute name||2|3.15|7|MRSAT|varchar(20)|
ATV|Attribute value||1|9.71|3634|MRSAT|varchar(4000)|
ATX|Associated expression||5|35.89|242|MRATX|varchar(300)|

2.7.2.3.3 Concept Names (File = MRCON)

There is exactly one row in this file for each meaning of each unique string in the Metathesaurus, i.e., there is exactly one row for each unique CUI-SUI combination in the Metathesaurus. Any difference in upper-lower case, word order, etc., creates a different unique string.


Col.

Description


CUI

Unique identifier for concept


LAT

Language of term


TS

Term status


LUI

Unique identifier for term


STT

String type


SUI

Unique identifier for string


STR

String


LRL

Least restriction level

Sample Records

C0002871|ENG|P|L0002871|PF|S0013742|Anemia|0|
C0002871|ENG|P|L0002871|VP|S0013787|Anemias|0|
C0002871|ENG|P|L0002871|VC|S0352787|ANEMIA|0|
C0002871|ENG|P|L0002871|VC|S0414880|anemia|0|
C0002871|ENG|P|L0002871|VO|S0470197|Anemia, NOS|3|
C0002871|ENG|S|L0280031|PF|S0803242|Anaemia|3|

2.7.2.3.4 Vocabulary Sources (File = MRSO)

This file contains the vocabulary source(s) for a concept, term, and string.

There is exactly one row in this file for each source of each string in the Metathesaurus. All Metathesaurus concepts have entries in this file.


Col.

Description


CUI

Unique identifier for concept


LUI

Unique identifier for term


SUI

Unique identifier for string


SAB

Abbreviated source name (SAB) for source vocabulary.  Maximum field length is 20 alphanumeric characters.  Two source abbreviations are assigned: 

  • Root Source Abbreviation (RSAB) — short form, no version information, for example, AI/RHEUM, 1993, has an RSAB of "AIR"
  • Versioned Source Abbreviation (VSAB) — includes version information, for example, AI/RHEUM, 1993, has an VSAB of "AIR93"

Official source names, RSABs, and VSABs are included in Appendix B.4.


TTY

Abbreviation for term type in source vocabulary, for example PN (Metathesaurus Preferred Name) or CD (Clinical Drug). Possible values are listed in Appendix B.3.


CODE

Unique identifier or code for string in that source


SRL

Source restriction level

Sample Records

C0002871|L0002871|S0013742|CCS|MD|4.1|0|
C0002871|L0002871|S0013742|ICPCPAE|PT|B82005|3|
C0002871|L0002871|S0013742|LCH|PT|U000235|0|
C0002871|L0002871|S0013742|MSH|MH|D000740|0|
C0002871|L0002871|S0013742|MTH|PT|U000161|0|
C0002871|L0002871|S0013742|MTH|PT|U000164|0|
C0002871|L0002871|S0013742|PSY|PT|02450|3|
C0002871|L0002871|S0013742|RCDAE|PT|XM05A|3|

The information in MRSO can be used in combination with MRCON to determine whether a particular concept, name, or code is present in a particular source, and in what form it appears.

Note: In the RRF, the concept name and vocabulary source information appear in a single file, MRCONSO.RRF.

2.7.2.3.5 Simple Concept and String Attributes (File = MRSAT)

There is exactly one row in this table for each concept, term and string attribute that does not have a sub-element structure. All Metathesaurus concepts have entries in this file.


Col.

Description


CUI

Unique identifier for concept


LUI

Unique identifier for term (optional)


SUI

Unique identifier for string (optional)


CODE

Unique identifier or code for entry in the source of the attribute, e.g., for all attributes derived from MeSH, the MeSH unique identifier (optional).


ATN

Attribute name. Possible values are all described in Appendix B, Section B.1.2.


SAB

Abbreviated source name (SAB).  Maximum field length is 20 alphanumeric characters.  Two source abbreviations are assigned: 

  • Root Source Abbreviation (RSAB) — short form, no version information, for example, AI/RHEUM, 1993, has an RSAB of "AIR"
  • Versioned Source Abbreviation (VSAB) — includes version information, for example, AI/RHEUM, 1993, has an VSAB of "AIR93"

Official source names, RSABs, and VSABs are included in Appendix B.4.


ATV

Attribute value described under specific attribute name in Appendix B, Section B.1.2.  A few attribute values exceed 1,000 characters.

Sample Records

C0002871|L0002871|S0013742|D000740|MMR|MSH|19960610|
C0002871|L0002871|S0013742|D000740|MN|MSH|C15.378.71|
C0002871|L0002871|S0013742|D000740|TH|MSH|POPLINE (1994)|
C0002871|L0002871|S0414880|208/04453|SOS|PDQ|secondary related condition|
C0002871|L0002871|S0470197|DC-10010|SIC|SNMI|285.9|

2.7.2.3.6 Definitions (File = MRDEF)

There is exactly one row in this file for each definition in the Metathesaurus. A few definitions approach 3,000 characters in length.


Col.

Description


CUI

Unique identifier for concept


SAB

Abbreviated source name (SAB) of the source of the definition.  Maximum field length is 20 alphanumeric characters.  Two source abbreviations are assigned: 

  • Root Source Abbreviation (RSAB) — short form, no version information, for example, AI/RHEUM, 1993, has an RSAB of "AIR"
  • Versioned Source Abbreviation (VSAB) — includes version information, for example, AI/RHEUM, 1993, has an VSAB of "AIR93"

Official source names, RSABs, and VSABs are included in Appendix B.4.


DEF

Definition

Sample Records

C0002871|MSH|A reduction in the number of circulating erythrocytes or in the quantity of hemoglobin.|

2.7.2.3.7 Semantic Types (File = MRSTY)

There is exactly one row in this file for each semantic type assigned to each concept. All Metathesaurus concepts have at least one entry in this file. Many have more than one entry.


Col.

Description


CUI

Unique identifier of concept


TUI

Unique identifier of Semantic Type


STY

Semantic Type. The valid values are defined in the Semantic Network.

Sample Record

C0002871|T047|Disease or Syndrome|

2.7.2.3.8 Locators (File = MRLO)

This file has been deleted from the Metathesaurus effective with the 2004AB release. Some of the information was outdated, some duplicated information contained in other Metathesaurus files, and some was easily obtained from other publicly available sources, e.g., PubMed.

2.7.2.3.9 Related Concepts (File = MRREL)

There is one row in this table for each relationship between Metathesaurus concepts known to the Metathesaurus, with the following exceptions found in other files: co-occurrences found in MRCOC and Associated Expressions found in MRATX.

Note that for asymmetrical relationships there is one row for each direction of the relationship. Note also the direction of REL - the relationship which the SECOND concept (with Concept Unique Identifier CUI2) HAS TO the FIRST concept (with Concept Unique Identifier CUI1).  


Col.

Description


CUI1

Unique identifier of first concept


REL

Relationship of SECOND to first concept


CUI2

Unique identifier of second concept


RELA

Relationship attribute


SAB

Abbreviated source name (SAB) of the source of relationship.  Maximum field length is 20 alphanumeric characters.  Two source abbreviations are assigned: 

  • Root Source Abbreviation (RSAB) — short form, no version information, for example, AI/RHEUM, 1993, has an RSAB of "AIR"
  • Versioned Source Abbreviation (VSAB) — includes version information, for example, AI/RHEUM, 1993, has an VSAB of "AIR93"

Official source names, RSABs, and VSABs are included in Appendix B.4.


SL

Source of relationship labels


MG

Machine-generated and unverified indicator (optional). G indicates 'machine generated'

Sample Records

   C0002871|CHD|C0002891|isa|MSH|MSH||
Anemia, Neonatal (C0002891)
has CHILD REL and isa RELA
to Anemia (C0002871)

C0002871|RB|C0221016||MTH|MTH||
[Red blood cell disorder, NOS (C0221016)
has broader REL
to Anemia (C0002871)]

C0002871|RL|C0002886|mapped_to|SNMI|SNMI||
[Anemia, Macrocytic (C0002886)
has like relationship
to Anemia (C0002871)]

C0002871|RO|C0002886|clinically_associated_with|CCPSS|CCPSS||
[Megaloblastic anemia due to folate deficiency, NOS (C0151482)
has clinically_associated_with relationship
to Anemia (C0002871)]


2.7.2.3.10 Co-occurring Concepts (File = MRCOC)

There are two rows in this table for each pair of concepts that co-occur in each information source represented one for each direction of the relationship. (Note that the COA data may be different for each direction of the relationship). Many Metathesaurus concepts have no entries in this file. Due to the very large number of co-occurrence relationships, they are distributed in a separate file.  


Col.

Description


CUI1

Unique identifier of first concept 


CUI2

Unique identifier of second concept
Note: Where COT is MeSH topical qualifier (LQ) and CUI2 is not present, the count of citations of CUI1 with no MeSH qualifiers is reported.


SOC

Abbreviation of the source of co-occurrence information if applicable


COT

Type of co-occurrence


COF

Frequency of co-occurrence, if applicable


COA

Attributes of co-occurrence, if applicable

Sample Records

C0002871||MED|LQ|1||
C0002871|C0000530|MBD|L|2|CI=1,EN=1,ME=1,PA=1|
C0002871|C0000727|MBD|L|1|BL=1,ET=1|
C0002871|C0000737|MBD|L|1|ET=1|
C0002871|C0000772|MBD|L|2|CN=2|

Co-occurrences are concepts that occur together in the same entries in some information source. The relationships represented here are obtained from machine-manipulation of the information source. Co-occurrence relationships may exist between similar concepts (e.g., Atrial Fibrillation and Arrhythmia) or between very different concepts that nevertheless have some important connection in the field of biomedicine (e.g., Atrial Fibrillation and Digoxin), or between a primary concept and a qualifier (e.g., Lithotripsy and instrumentation). A co-occurrence relationship can exist between two concepts that have no other apparent relationship, although the frequency of such co-occurrences will be small.

In the current Metathesaurus, there are three sources of co-occurrence data: MEDLINE, AI/RHEUM, and CCPSS. From MEDLINE, co-occurrence data was computed for concepts that were designated as principal or main points in the same journal article i.e., the co-occurrence counts do not include articles in which either or both of the concepts were present and indexed in MEDLINE but not designated as main points. (A concept is considered to be a main point if the * is attached to the main heading or any of its subheadings.)

Two overall frequencies of MEDLINE co-occurrence are provided: one for recent MEDLINE data (MED) and one for MEDLINE data from a preceding block of years (MBD); see SOC for date ranges in the current edition. Separate counts are provided for the frequencies with which the first concept was qualified by different MeSH qualifiers or by no qualifier at all when it co-occurred with the second concept. There are separate entries for each direction of the co-occurrence relationship. The related subheading occurrence information in each entry belongs to the first concept in the entry and is therefore different for each direction of the relationship.

In addition to the specific qualifier information associated with two co-occurring concepts, in entries with LQ and LQB values for type of co-occurrence, this element also includes totals for the number of times each main concept was qualified by a specific subheading or by no subheading.

The AI/RHEUM co-occurrence data represent the co-occurrence of diseases and findings in the AI/RHEUM knowledge base, i.e., the diseases that co-occur with a particular finding and the findings that co-occur with a particular disease. Each disease/finding pair can co-occur only once in the AI/RHEUM knowledge base.

In CCPSS, the co-occurrence data is extracted from patient records and includes problem-problem co-occurrences within a patient record as well as problem-modifier co-occurrences.

2.7.2.3.11 Concept contexts (File = MRCXT)

This file is no longer distributed. To create the MRCXT file, use the new MRCXT Builder application, accessible from the MetamorphoSys Welcome screen. Information on the MRCXT Builder can be found at http://www.nlm.nih.gov/research/umls/mrcxt_help.html. The information below describes the content of the file when produced by the MRCXT Builder.

There are rows in this file for each occurrence of a concept in a hierarchy in any of the UMLS source vocabularies - a "context" in this discussion. Many Metathesaurus concepts have multiple contexts while others may have none. The number of rows per context differs depending on the number of ancestor, sibling, or child terms the concept has in that context. Because some concepts have multiple contexts in the same source (e.g., MeSH), a context number (CXN - e.g., 1, 2, 3) is used to identify all members of the same context. The CXNs are not global but are created as required for each concept. Since some concepts have multiple contexts in the same vocabulary with the same SUI, each distinct context can be retrieved with a CUI-SUI-SAB-CXN key. The "distance-1 relationships" i.e., the immediate parent, immediate child, and sibling relationships, represented in this file are also present in the MRREL file.


Col.

Description


CUI

Unique identifier of concept


SUI

Unique identifier of string used in this context


SAB

Abbreviated source name (SAB).  Maximum field length is 20 alphanumeric characters.  Two source abbreviations are assigned: 

  • Root Source Abbreviation (RSAB) — short form, no version information, for example, AI/RHEUM, 1993, has an RSAB of "AIR"
  • Versioned Source Abbreviation (VSAB) — includes version information, for example, AI/RHEUM, 1993, has an VSAB of "AIR93"

Official source names, RSABs, and VSABs are included in Appendix B.4.


CODE

Unique identifier or code for string in that source.


CXN

The context number (to distinguish multiple contexts in the same source with the same SUI)


CXL

Context member label, i.e., ANC for ancestor of this concept, CCP for concept, SIB for sibling of this concept, CHD for child of this concept


RNK

For rows with a CXL value of ANC, the rank of the ancestors (e.g., a value of 1 denotes the most remote ancestor in the hierarchy)


CXS

String for context member


CUI2

Unique concept identifier of context member (may be empty if context member is not yet in the Metathesaurus)


HCD

Hierarchical number or code of context member in this source (optional)


RELA

Relationship attribute providing further categorization of the CXL, if applicable and known. Allowed values are listed in Appendix B.3.


XC

A plus (+) sign indicates that the CUI2 for this row has children in this context. If this field is empty, the CUI2 does not have children in this context.

Sample Records

 C0002871|S0013742|MSH|D000740|1|ANC|1|MeSH|C0220876||||
 C0002871|S0013742|MSH|D000740|1|ANC|2|Diseases (MeSH Category)|C0012674|C|||
 C0002871|S0013742|MSH|D000740|1|ANC|3|Hemic and Lymphatic Diseases|C0018981|C15|||
 C0002871|S0013742|MSH|D000740|1|ANC|4|Hematologic Diseases|C0018939|C15.378|isa||
 C0002871|S0013742|MSH|D000740|1|CCP||Anemia|C0002871|C15.378.71|isa|+|
 C0002871|S0013742|MSH|D000740|1|CHD||Anemia, Aplastic|C0002874|C15.378.71.85|isa|+|
 C0002871|S0013742|MSH|D000740|1|SIB||Blood Protein Disorders|C0005830|C15.378.147|isa|+|
 C0002871|S0013742|MSH|D000740|1|CHD||Anemia, Hemolytic|C0002878|C15.378.71.141|isa|+|

2.7.2.3.12 Associated Expressions (File = MRATX)

There is one row in this table for each vocabulary expression (i.e., combination of terms from a specific Metathesaurus source vocabulary) identified as having a relationship to a concept in the Metathesaurus. The majority of Metathesaurus entries have no entries in this table.


Col.

Description


CUI

Unique identifier of concept to which the expression is related


SAB

Abbreviated source name (SAB) of source of terms in expression.  Maximum field length is 20 alphanumeric characters.  Two source abbreviations are assigned: 

  • Root Source Abbreviation (RSAB) — short form, no version information, for example, AI/RHEUM, 1993, has an RSAB of "AIR"
  • Versioned Source Abbreviation (VSAB) — includes version information, for example, AI/RHEUM, 1993, has an VSAB of "AIR93"

Official source names, RSABs, and VSABs are included in Appendix B.4.


REL

Relationship of meaning of expression to main concept


ATX

Associated expression

Sample Records

C0001207|MSH|S|<Acromegaly> AND <Gigantism>|
C0001296|LCH|U|<Insurance>/<Statistics>|
C0001355|MSH|S|<Kidney Failure, Acute> AND <Kidney Papillary Necrosis>|

2.7.2.3.13 Source Information (File = MRSAB)

The Metathesaurus has "versionless" or "root" Source Abbreviations (SABs) in the data files. MRSAB connects the root SAB to fully specified version information for the current release. For example, the released SAB for MeSH is now simply "MSH". In MRSAB, you will find the current versioned SAB, e.g., MSH2003_2002_10_24. MetamorphoSys can produce files with either the root or versioned SABs so that either form can be utilized by a user.

There is one row in this file for every version of every source in the current Metathesaurus; when complete, there will also be historical information with a row for each version of each source that has appeared in any Metathesaurus release. Note that the field CURVER has the value Y to identify the version in this Metathesaurus release. Future releases of MRSAB will also contain historical version information in rows with CURVER value N.

MRSAB allows all other Metathesaurus files to use versionless source abbreviations, so that rows with no data change between versions also remain unchanged.

The full structure of MRSAB is as follows:

Field
Full Name
Description
VCUI
CUI
CUI of the versioned SRC concept for a source
RCUI
Root CUI
CUI of the root SRC concept for a source
VSAB
Versioned Source Abbreviation
The versioned source abbreviation for a source e.g., MSH2003_2002_10_24
RSAB
Root Source Abbreviation
The root source abbreviation for a source e.g., MSH
SON
Official Name
The official name for a source
SF
Source Family
The source family for a source
SVER
Version
The source version e.g., 2001
VSTART
Valid Start Date For A Source
The date a source became active, e.g., 2004_04_03
VEND
Valid End Date For A Source
The date a source ceased to be active, e.g., 2003_05_10
IMETA
Meta Insert Version
The version of the Metathesaurus in which a source first appeared, e.g., 2001AB
RMETA
Meta Remove Version
The version of the Metathesaurus in which a source last appeared, e.g., 2001AC
SLC
Source License Contact
The source license contact information
SCC
Source Content Contact
The source content contact information
SRL
Source Restriction Level
0, 1, 2, 3, 4 – explained in the License Agreement
TFR
Term Frequency
The number of terms for this source in MRCON/MRSO, e.g., 12343
CFR
CUI Frequency
The number of CUIs associated with this source, e.g., 10234
CXTY
Context Type
The type of contexts for this source. Values are FULL, FULL-MULTIPLE, FULL-NOSIB, FULL-NOSIB-MULTIPLE, FULL-MULTIPLE-NOSIB-RELA, null.
TTYL
Term Type List
Term type list from source, e.g.,  MH, EN, PM, TQ
ATNL
Attribute Name List
The attribute name list (from MRSAT), e.g., MUI, RN, TH
LAT
Language
The language of the source
CENC
Character Encoding
All UMLS content is provided in Unicode, encoded in UTF-8.

MetamorphoSys will allow exclusion of extended characters with some loss of information. Transliteration to other character encodings is possible but not supported buy NLM; for further information, see http://www.unicode.org.
CURVER
Current Version
A Y or N flag indicating whether or not this row corresponds to the current version of the named source
SABIN
Source in Subset
A Y or N flag indicating whether or not this row is represented in the current MetamorphoSys subset. Initially always Y where CURVER is Y, but later is recomputed by MetamorphoSys.

Sources with contexts have "full" contexts, i.e., all levels of terms may have Ancestors, Parents, Children and Siblings. A full context may also be further designated as Multiple, Nosib (No siblings) or both Multiple and Nosib.

Multiple indicates that a single concept in this source may have multiple hierarchical positions.

No siblings (Nosib) indicates that siblings have not been computed for this source.

Appendix B.4, Source Vocabularies, lists each source in the Metathesaurus and includes information about the type of context, if any, for each source.

Sample Record

C1371270|C1140284|RXNORM_04AB|RXNORM|RXNORM Project, META2004AB | RXNORM | 04AB | 2004_05_17 | | 2004AB | | Stuart Nelson, M.D., Head, MeSH Section; e-mail: nelson@nlm.nih.gov | Stuart Nelson, M.D., Head, MeSH Section; e-mail: nelson@nlm.nih.gov | 0 138005 | 110403 | | BN,IN,OBD,OCD,SBD,SBDF,SCD,SCDC,SCDF,SY | ORIG_CODE,ORIG_SOURCE | ENG | UTF-8| Y | Y |

2.7.2.3.14 Concept Name Ranking (File = MRRANK)

There is exactly one row for each concept name type from each Metathesaurus source vocabulary (each SAB-TTY combination). The RANK and SUPPRESS values in the distributed file are those used in Metathesaurus production. Users are free to change these values to suit their needs and preferences, then change the naming precedence and suppressibility (TS in MRCON) by using MetamorphoSys to create a customized Metathesaurus.


Col.

Description


RANK

Numeric order of precedence, higher value wins


SAB

Abbreviated source name (SAB).  Maximum field length is 20 alphanumeric characters.  Two source abbreviations are assigned: 

  • Root Source Abbreviation (RSAB) — short form, no version information, for example, AI/RHEUM, 1993, has an RSAB of "AIR"
  • Versioned Source Abbreviation (VSAB) — includes version information, for example, AI/RHEUM, 1993, has an VSAB of "AIR93"

Official source names, RSABs, and VSABs are included in Appendix B.4.


TTY

Abbreviation for term type in source vocabulary, for example PN (Metathesaurus Preferred Name) or CD (Clinical Drug). Possible values are listed in Appendix B.3.


SUPPRESS

Flag indicating that this SAB and TTY will create a TS=s MRCON entry; see TS

Sample Records

0210|AIR|SY|N|
0209|ULT|PT|N|
0208|CPT|PT|N|

2.7.2.3.15 Ambiguous Term Identifiers (File = AMBIG.LUI)

In the instance that a Lexical Unique Identifier (LUI) is linked to multiple Concept Unique Identifiers (CUIs), there is one row in this table for each LUI-CUIs pair. This file identifies those lexical variant classes which have multiple meanings in the Metathesaurus.

In the Metathesaurus, the LUI links all strings within the English language that are identified as lexical variants of each other by the luinorm program found in the UMLS SPECIALIST Lexicon and Tools. LUIs are assigned irrespective of the meaning of each string. This table may be useful to system developers who wish to make use of the lexical programs in their applications to identify and disambiguate ambiguous terms.


Col.

Description


LUI

Lexical Unique Identifier


CUI

Concept Unique Identifier

Sample Records

L0000003|C0010504|
L0000003|C0917995|
L0000032|C0010206|

2.7.2.3.16 Ambiguous String Identifiers (File = AMBIG.SUI)

In the instance that a String Unique Identifier (SUI) is linked to multiple Concept Unique Identifiers (CUIs), there is one row in this table for each SUI-CUIs pair.

This file resides in the META directory. In the Metathesaurus, there is only one SUI for each unique string within each language, even if the string has multiple meanings. This table is only of interest to system developers who make use of the SUI in their applications or in local data files.


Col.

Description


SUI

String Unique Identifier


CUI

Concept Unique Identifier

Sample Records

S0063890|C0026667|
S0063890|C1135584|
S0229413|C0008802|

2.7.2.3.17 Metathesaurus Change Files

There are six files or relations that identify key differences between entries in the previous and the current edition of the Metathesaurus. Developers can use these special files to determine whether there have been changes that affect their applications.

The usefulness of individual files will depend on how data from the Metathesaurus have been linked or incorporated in a particular application.

Each relation or named table of data has a fixed number of columns and variable number of rows. A column is a sequence of all the values in a given data element. A row contains the values for two or more data elements for one entry. The values for the different data elements in the row are separated by vertical bars (|). Each row ends with a vertical bar and line termination.

2.7.2.3.17.1 Deleted Concepts (File = DELETED.CUI)

Concepts whose meaning is no longer present in the Metathesaurus are reported in this file. There is a row for each concept that existed in the previous release and is not present in the current release. If the meaning exists in the current release, i.e., the missing concept was merged with another current concept, it is reported in the MERGEDCUI file (section 2.7.2.3.17.2) and not in this file.


Col.

Description


CUI

Concept unique identifier in the previous Metathesaurus


STR

Preferred name of this concept in the previous Metathesaurus

2.7.2.3.17.2 Merged Concepts (File = MERGED.CUI)

There is exactly one row in this table for each released concept in the previous Metathesaurus (CUI1) that was merged into another released concept from the previous Metathesaurus (CUI2). When this merge occurs, the first CUI (CUI1) was retired; this table shows the CUI (CUI2) for the merged concept in this Metathesaurus.

Entries in this file represent concepts pairs that were considered to have different meanings in the previous edition, but which are now identified as synonyms


Col.

Description


CUI1

Concept unique identifier in the previous Metathesaurus


CUI2

concept unique identifier in this Metathesaurus in format C#######

2.7.2.3.17.3 Deleted Terms (File = DELETED.LUI)

There is exactly one row in this table for each Lexical Unique Identifier (LUI) that appeared in the previous version of the Metathesaurus, but does not appear in this version.

LUIs are assigned by the luinorm program, part of the LVG program in the UMLS SPECIALIST Lexicon and tools; see Section 4 in this manual.

These entries represent the cases where LUIs identified by the previous release's luinorm program, when used to identify lexical variants in the previous Metathesaurus, are no longer found with this release's luinorm on this release's Metathesaurus. This does not necessarily imply the deletion of a string or a concept from the Metathesaurus.


Col.

Description


LUI

Concept unique identifier in the previous Metathesaurus


STR

Preferred name of Term in the previous Metathesaurus

2.7.2.3.17.4 Merged Terms (File = MERGED.LUI)

There is exactly one row in this file for each case in which strings had different LUIs in the previous Metathesaurus yet share the same LUI in this Metathesaurus; a LUI present in the previous Metathesaurus is therefore absent from this Metathesaurus.

LUIs are assigned by the luinorm program, part of the LVG program in the UMLS SPECIALIST Lexicon and Tools; see Section 4.

These entries represent the cases where separate lexical variants as identified by the previous release's luinorm program version are a single lexical variant as identified by this release's luinorm.


Col.

Description


LUI1

Lexical unique identifier in the previous Metathesaurus but not present in this Metathesaurus


LUI2

Lexical unique identifier into which it was merged in this Metathesaurus

2.7.2.3.17.5 Deleted Strings (File = DELETED.SUI)

There is exactly one row in this file for each string in each language that was present in an entry in the previous Metathesaurus and does not appear in this Metathesaurus.

Note that this does not necessarily imply the deletion of a term (LUI) or a concept (CUI) from the Metathesaurus. A string deleted in one language may still appear in the Metathesaurus in another language.


Col.

Description


SUI

String unique identifier in the previous Metathesaurus that is not present in this Metathesaurus


LAT

Three-character abbreviation of language of string that has been deleted


STR

Preferred name of term in the previous Metathesaurus that is not present in this Metathesaurus

2.7.2.3.17.6 Retired CUI Mapping (File = MRCUI)

There are one or more rows in this file for each Concept Unique Identifier (CUI) that existed in any prior release but is not present in the current release. The file includes mappings to current CUIs as synonymous or to one or more related current CUI where possible. If a synonymous mapping cannot be found, other relationships between the CUIs can be created. These relationships can be Broader (RB), Narrower (RN), Other Related (RO), Deleted (DEL) or Removed from Subset (SUBX). Rows with the SUBX relationship are added to MRCUI by MetamorphoSys for each CUI that met the exclusion criteria and was consequently removed from the subset. Some CUIs may be mapped to more than one other CUI using these relationships.

CUIs may be retired when (1) two released concepts are found to be synonyms and so are merged, retiring one CUI; (2) the concept no longer appears in any source vocabulary and is not 'rescued' by NLM; or (3) the concept is an acknowledged error in a source vocabulary or determined to be a Metathesaurus production error.

See the META/CHANGE files, especially MERGED.CUI and DELETED.CUI, for the changes from the last release only, without mappings.


Col.

Description


CUI1

Retired CUI - was present in some prior release, but is currently missing


VER

The last release version in which CUI1 was a valid CUI


CREL

The relationship CUI2 has to CUI1, if present, or DEL if CUI2 is not present. Valid values currently are SY, DEL, RO, RN, RB.


CUI2

The current CUI that CUI1 most closely maps to


MAPIN

Is this map in current subset? Values of Y, N, or null. MetamorphoSys generates the Y or N to indicate whether the CUI2 concept is or is not present in the subset. The null value is for rows where the CUI1 was not present to begin with (i.e., REL=DEL).

Sample Records

C0079138|2001AA|DEL||Y|
C0079138|2001AA|RO|C0037440|Y|
C0079151|1993AA|DEL||N|
C0079158|1997AA|SY|C0009081||
C0079167|1997AA|SY|C0010042|N|


2.7.2.3.18 Word Index (File = MRXW.BAQ, MRXW.DAN, MRXW.DUT, MRXW.ENG, MRXW.FIN, MRXW.FRE, MRXW.GER, MRXW.HEB, MRXW.HUN, MRXW.ITA, MRXW.NOR, MRXW.POR, MRXW.RUS, MRXW.SPA, MRXW.SWE)

There is one row in these tables for each word found in each unique Metathesaurus string (ignoring upper-lower case). All Metathesaurus entries have entries in the word index. The entries are sorted in ASCII order.


Col.

Description


LAT

Abbreviation of language of the string in which the word appears


WD

Word in lowercase


CUI

Concept identifier


LUI

Term identifier


SUI

String identifier

Sample Records from MRXW.ENG

ENG|anaemia|C0002871|L0280031|S0352688|
ENG|anemia|C0002871|L0002871|S0013742|
ENG|anemias|C0002871|L0002871|S0013787|
ENG|blood|C0002871|L0376533|S0500659|

Sample Records from MRXW.FRE

FRE|ANEMIE|C0002871|L0162748|S0227229|

2.7.2.3.19 Normalized Word Index (File = MRXNW.ENG)

There is one row in this table for each normalized word found in each unique English-language Metathesaurus string. All English-language Metathesaurus entries have entries in the normalized word index. There are no normalized string indexes for other languages in this edition of the Metathesaurus.


Col.

Description


LAT

Abbreviation of language of the string in which the word appears (always ENG in this edition of the Metathesaurus)


NWD

Normalized word in lowercase (described in Section 2.6.2.1)


CUI

Concept identifier


LUI

Term identifier


SUI

String identifier

Sample Records

ENG|anaemia|C0002871|L0280031|S0352688|
ENG|anemia|C0002871|L0002871|S0013742|
ENG|anemia|C0002871|L0002871|S0013787|
ENG|blood|C0002871|L0376533|S0500659|

2.7.2.3.20 Normalized String Index (File = MRXNS.ENG)

There is one row in this table for each normalized string found in each unique English-language Metathesaurus string (ignoring upper-lower case). All English-language Metathesaurus entries have entries in the normalized string index. There are no normalized word indexes for other languages in this edition of the Metathesaurus.


Col.

Description


LAT

Abbreviation of language of the string (always ENG in this edition of the Metathesaurus)


NSTR

Normalized string in lowercase (described in Section 2.6.3.1)


CUI

Concept identifier


LUI

Term identifier


SUI

String identifier

Sample Records

ENG|anaemia|C0002871|L0280031|S0352688|
ENG|anaemia unspecified|C0002871|L0696700|S0803315|
ENG|anemia|C0002871|L0002871|S0013787|

2.8 Character Sets

The UMLS Knowledge Sources are distributed in Unicode (specifically, in the UTF-8 encoding of the Unicode 4.0 standard [1]) to avoid complexity and information loss.

Unicode is a single unified and interoperable global standard, which includes the characters needed to write in any language (see www.unicode.org). Unicode also includes diacritical marks, ideographs, and scientific and other symbols. Most modern systems already use Unicode; we strongly encourage you to upgrade to Unicode compliant systems and software.

The 7-bit basic ASCII character set is the 'least common denominator' character set of 96 characters and symbols from the oldest ASCII standard. UTF-8 is identical to the ASCII encoding for characters in the 7-bit ASCII range, so that 7-bit ASCII files are automatically a correct subset of UTF-8. This means that sources originally in 7-bit ASCII are unchanged. In the UMLS, the term 'extended characters' refers to all Unicode characters beyond this 7-bit ASCII subset. All other character sets are converted to, and distributed in, UTF-8.

Note that the UMLS LAT - Language of Term(s) - is the language the source declares. Since the world does not speak or write in 7-bit ASCII, sources often include extended characters for symbols or from other languages, for example in eponyms.

The MetamorphoSys default is to output all records and data in standard UTF-8. Checking the option to "Remove records containing extended UTF-8 characters" will exclude from your subset all terms and other data that contain extended characters. This will create gaps in the hierarchy and may cause loss of vocabulary which matters to your application.

For most English or Spanish sources, i.e., LAT = ENG or SPA, an equivalent 7-bit ASCII string is created for the UMLS to help users of older systems. If you wish to use them, these forms must not be excluded from your subset. These forms are created by the LVG program (see http://umlslex.nlm.nih.gov). This program may be of interest to those who wish to do further conversions; it converts extended characters to an escaped form of the official Unicode character name to ensure that no information is lost. These names may not be "reader friendly" but are useful for some purposes such as indexing.

The initial byte order mark (BOM) character is not present in the UTF-8 encoded Metathesaurus files unless the option "Add UTF-8 BOM characters to output files" is selected on the Output options tab in MetamorphoSys.

Files will be in byte sort order (for example, with data in UTF-8, standard UNIX sort works as expected). Note that the UMLS data are intended to be manipulated with software tools such as database systems, so the sort order of the files should not matter.

 


Previous  |  Table of Contents  |  Next

Last reviewed: 10 July 2008
Last updated: 10 July 2008
First published: 20 July 2004
Metadata| Permanence level: Permanent: Stable Content
Previous version