Unified Medical Language System Metathesaurus
The Metathesaurus is a very large, multi-purpose, and
multi-lingual vocabulary database that contains information about biomedical
and health related concepts, their various names, and the relationships among
them. It is built from the electronic versions of many different thesauri,
classifications, code sets, and lists of controlled terms used in patient care,
health services billing, public health statistics, indexing and cataloging
biomedical literature, and/or basic, clinical, and health services research.
On July 1, 2003, Tommy G. Thompson, Secretary of Health and Human Services, announced an
agreement with the College of American Pathologists (CAP) that made
SNOMED
Clinical Terms (SNOMED CT) available to U.S. users at no cost through the UMLS. The 2004AA UMLS release was the first release to include the active core SNOMED
CT content as well as the
mapping from SNOMED CT
to ICD-9-CM. The Spanish language version of SNOMED CT
was added in the 2004AB version of the UMLS Metathesaurus. Obsolete SNOMED
CT content has not yet been incorporated, but will
be available in future versions of the Metathesaurus.
The Metathesaurus format was revised for 2004AA to accommodate
the new data. An expansion of the
existing format, the "Rich Release Format" (RRF) supports source
transparency, i.e., there is no loss of information in representing a source in
the Metathesaurus. The RRF introduces
many different unique identifiers (UIs) for labeling and tracking information.
The old format ("Original Release Format" or ORF) is also available as an output option in MetamorphoSys, a
tool distributed with the UMLS for installation and customization.
Between 2003AA and 2004AA, the size of the Metathesaurus increased
significantly: 46% more concept names, 16% more concepts, and 25% more
relationships. The large file sizes
involved necessitated the move to using DVD's as a distribution
mechanism. Even split and compressed,
the UMLS was over 2.1 GB in size (uncompressed it was over 15GB), with some
files exceeding 2GB. The MetamorphoSys tool
was significantly enhanced to be portable across multiple platforms, and now
functions as an installation tool for the UMLS in addition to its traditional
role as an aid to Metathesaurus customization.
The UMLS is also available as a download from the Knowledge Sources
Server (KSS) for users with sufficient capacity.
The UMLS Metathesaurus is now a multilingual resource with
data in 17 different languages in 2004AB.
In order to represent the information accurately, the Metathesaurus is
now encoded in Unicode, using the UTF-8 transformation format.
This entails converting data from a variety
of different native encodings into UTF-8 during insertion into the
Metathesaurus.
The changes reported here will be presented at
Medinfo 2004 in a paper and tutorial.
|