Lister Hill National Center for Biomedical Communications Home Page National Library of Medicine
CSB Home
CSB Staff Directory
CSB Projects
  • Unified Medical Language System (UMLS) Metathesaurus

Unified Medical Language System Metathesaurus

The Metathesaurus is a very large, multi-purpose, and multi-lingual vocabulary database that contains information about biomedical and health related concepts, their various names, and the relationships among them. It is built from the electronic versions of many different thesauri, classifications, code sets, and lists of controlled terms used in patient care, health services billing, public health statistics, indexing and cataloging biomedical literature, and/or basic, clinical, and health services research.

On July 1, 2003, Tommy G. Thompson, Secretary of Health and Human Services, announced an agreement with the College of American Pathologists (CAP) that made SNOMED Clinical Terms (SNOMED CT) available to U.S. users at no cost through the UMLS. The 2004AA UMLS release was the first release to include the active core SNOMED CT content as well as the mapping from SNOMED CT to ICD-9-CM. The Spanish language version of SNOMED CT was added in the 2004AB version of the UMLS Metathesaurus. Obsolete SNOMED CT content has not yet been incorporated, but will be available in future versions of the Metathesaurus.

The Metathesaurus format was revised for 2004AA to accommodate the new data. An expansion of the existing format, the "Rich Release Format" (RRF) supports source transparency, i.e., there is no loss of information in representing a source in the Metathesaurus. The RRF introduces many different unique identifiers (UIs) for labeling and tracking information. The old format ("Original Release Format" or ORF) is also available as an output option in MetamorphoSys, a tool distributed with the UMLS for installation and customization.

Between 2003AA and 2004AA, the size of the Metathesaurus increased significantly: 46% more concept names, 16% more concepts, and 25% more relationships. The large file sizes involved necessitated the move to using DVD's as a distribution mechanism. Even split and compressed, the UMLS was over 2.1 GB in size (uncompressed it was over 15GB), with some files exceeding 2GB. The MetamorphoSys tool was significantly enhanced to be portable across multiple platforms, and now functions as an installation tool for the UMLS in addition to its traditional role as an aid to Metathesaurus customization. The UMLS is also available as a download from the Knowledge Sources Server (KSS) for users with sufficient capacity.

The UMLS Metathesaurus is now a multilingual resource with data in 17 different languages in 2004AB. In order to represent the information accurately, the Metathesaurus is now encoded in Unicode, using the UTF-8 transformation format. This entails converting data from a variety of different native encodings into UTF-8 during insertion into the Metathesaurus.

The changes reported here will be presented at Medinfo 2004 in a paper and tutorial.