Skip to Content
United States National Library of Medicine National Institutes of Health

Section 6
MetamorphoSys - The UMLS Installation and Customization Program

6.0 Introduction

MetamorphoSys is the UMLS installation wizard and Metathesaurus customization tool included in each UMLS release. It installs one or more of the UMLS Knowledge Sources. When the Metathesaurus is selected, it enables you to create customized Metathesaurus subsets. Please use only the version of MetamorphoSys distributed with the release.

Users customize their Metathesaurus subsets for two main purposes:

  1. To exclude vocabularies from output that are not required or licensed for use in a local application.

    The Metathesaurus consists of a number of files, some of which are extremely large; excluding sources can significantly reduce the size of the output subset. Given the number and variety of vocabularies reflected in the Metathesaurus, it is unlikely that any user would require all, or even most, of its more than 100 vocabularies. In addition, some sources require separate license agreements for specific uses, which a UMLS user may not wish to obtain. These are clearly indicated in the Appendix to the License Agreement.

  2. To customize a subset using a variety of data output options and filters.

To identify vocabularies that may not be needed in a customized subset, read the License Agreement and its Appendix, and refer to Appendix B.4 in this documentation. Additional information about some source vocabularies may be found on the UMLS Homepage under Metathesaurus Source Vocabularies.

There are no license restrictions on the MetamorphoSys code. We hope that users will acknowledge the NLM source, in the spirit of the GNU Public License (GPL).

6.1 MetamorphoSys Requirements

MetamorphoSys has been tested on the following operating systems:

It is implemented in Java and requires the run-time JRE version included in the release (except for the Macintosh, which licenses its own JRE).

**Macintosh note: MetamorphoSys requires Java 1.5. Mac OS X 10.3 "Panther" does not support Java 1.5. Mac OS X 10.5 "Leopard", expected to be released later this year, will support Java 1.5.

You may execute MetamorphoSys from the UMLS DVD-ROM (which contains the application and the compressed UMLS Knowledge Sources data files) or use a high-speed Internet connection to download files from the UMLS Knowledge Sources Server (UMLSKS). To ensure proper functionality users should download and extract all UMLS data and zip files to the same directory. Because downloads on a T1 line with 1 megabit per second throughput will require over 5 hours, we expect that MetamorphoSys will usually be run from the DVD.

To use the DVD, you must have a DVD reader and at least 13 GB of free disk space. Multiple runs that create multiple subsets of the Metathesaurus will need even more space. For reasonable performance, we suggest these minimum requirements:

DVD options allow you to (1) install the UMLS Knowledge Sources from the DVD, (2) copy MetamorphoSys .nlm data format files to local storage, and (3) copy the installation program and files to local storage. This may be useful for multiple runs or subsetting an existing subset, and it may improve performance time.

All file sizes are checked at installation. The Validate Distribution option allows users to verify the integrity of .nlm files downloaded from the UMLSKS or copied from the UMLS DVD. It compares special MD5 signatures to those in the release .MD5 file. CHK file, and is a useful first step for trouble-shooting when problems occur with a UMLS installation.

If the UMLS release is downloaded from the UMLSKS, it must include these files, in the same directory:

The mmsys.zip file is first unzipped to local storage and the MetamorphoSys application started. To ensure proper functionality users must unzip mmsys.zip to the same directory as the other downloaded files.

6.2 Starting MetamorphoSys

Open a terminal window and change to the root directory of the DVD-ROM. Type the appropriate command for your platform:

Press the return key.

A new window will appear. This may take a few minutes since a good deal of software must load before the Welcome screen appears.

On Windows machines with Autorun enabled, the DVD will start automatically. If it does not, go to the root directory of the DVD-ROM and click on the file named windows_mmsys.bat.

6.3 Using MetamorphoSys

MetamorphoSys screens and tabs will lead you through the process of installing all the UMLS Knowledge Sources and customizing the Metathesaurus.

6.3.1 Welcome to MetamorphoSys

Select one of the following:

  1. Install UMLS — to install one or more UMLS Knowledge Sources.
  2. Browse my Subset — to open the RRF subset browser.

Other options located on the File menu include Validate Distribution and Copy to Hard Drive. Recently moved to the Advanced menu are the options ' Build MRCXT' — to open the MRCXT Builder and 'Customize My Subset' (See Section 6.5) — to further customize an existing Metathesaurus subset.

Validate Distribution confirms that all UMLS files, whether they have been downloaded or distributed via DVD, have transferred correctly and are complete. The process takes approximately 30 minutes and produces a log file (validation.log) and an alert box that displays a statement regarding the validity of the files. Use "Validate Distribution" as your first step in troubleshooting when experiencing any malfunctions.

In order to create correct subsets, you MUST use the version of MetamorphoSys that matches the version of the Metathesaurus release files being subsetted. Do not use older versions of MetamorphoSys with newer or older release files; use the version of MetamorphoSys included with the release files.

Copy to Hard Drive copies MetamorphoSys and the Knowledge Sources data files, identified by the ".nlm" extension, to local storage. Local storage may improve startup times for MetamorphoSys as contrasted to running off the DVD.

6.3.1.1 Install UMLS

MetamorphoSys creates a top-level destination directory in local storage for the UMLS Knowledge Sources. The directory is named with the release version, e.g., 2005AC. The following directory structure is created beneath the destination directory, shown below for the 2006AC release:

<installation directory>

2006AC
NET
LEX
META
 

You may install any one, two, or all three Knowledge Sources as follows:

Selection

Installed To

Semantic Network NET directory
SPECIALIST Lexicon LEX directory
MetamorphosSys MMSYS directory

The META directory is populated with the Metathesaurus subset files created during installation. Depending on your configuration, some of these files may contain zero bytes.

Use the Browse button to locate source and destination directory locations.

Click OK to proceed with installation. A progress monitor tracks each step of the installation process. If the Metathesaurus is selected, installation will begin after all Metathesaurus options are selected.

To cancel installation at any time, click Cancel at the bottom of the Install UMLS progress screen, or at the bottom of the MetamorphoSys progress window.

6.3.1.2 MetamorphoSys Configuration

Select New Configuration to create a new subset configuration. Select Open Configuration to open a previously saved configuration file.

6.3.1.2.1 License Agreement Notice

The Metathesaurus contains source vocabularies produced by many different copyright holders. The majority of the content of the Metathesaurus is available for use under the basic (and quite open) terms described in the Metathesaurus license agreement.

However, some vocabulary producers place ADDITIONAL RESTRICTIONS ON THE USE OF THEIR CONTENT AS DISTRIBUTED WITHIN THE METATHESAURUS.

Three levels of additional restrictions are described in Section 12 of the license agreement. Individual vocabularies and their restriction levels are listed in the Appendix to the UMLS License Agreement. If a user already has a separate license for use of one of the source vocabularies, the existing license also applies to that source as distributed within the Metathesaurus. In some cases, UMLS users may have to request permission or negotiate a separate license with a vocabulary producer in order to use that vocabulary in a production system. There may be a charge associated with these separate permissions or license agreements.

Click Accept or Do Not Accept after reviewing the license agreement.

6.4 Select Default Subset

Two default subsets have been defined for creating useful and manageable output subsets. Others may be added in the future based on user feedback. During initial installation of the Metathesaurus, you must select one of two default subsets as a starting point:

  1. Level 0 — contains vocabulary sources for which no separate, additional license agreements are necessary beyond the UMLS license.
  2. Level 0 + SNOMED CT — contains all Level 0 sources (no additional licenses needed for sources) and SNOMED CT.
Note: Non-U.S. users must have separate license agreements to use SNOMED CT (see Section 12 in the UMLS license agreement.

You will have the opportunity to modify your default subset to include or exclude additional sources using the Source List tab (see Section 6.6.3 below).

Please note: The default RxNorm subset, which contained RxNorm concepts in Level 0 sources, has been removed. Users can produce the same subset by applying the RxNorm Filter. Access this filter using the File Menu, Enable/Disable Filters option. An additional MetamorphoSys tab for the RxNorm filter will appear when the filter has been selected. For more information regarding the RxNorm Filter see Section 6.6.1.5.

6.5 Option Tabs

Five basic Options Tabs — Input Options, Output Options, Source List, Precedence, and Suppressibility — provide a variety of customization options. In addition to the five basic option tabs, there are additional filters that can be used to customize the Metathesaurus. See 6.6.1, Enable/Disable Filter.

You may select and complete Option tabs in any order. Note that the selections that you make in one option may affect the data displayed, and the choices available, on other Options tabs.

You may return to the default settings for any option. Select 'Reset' on the menu bar, then select the appropriate Reset command.

When you have completed configuring your Metathesaurus subset, go to the menu bar, select 'Done', and then 'Begin Subset'.

You will be prompted to save your configuration. Name your configuration file, which will be stored in the destination META directory. This file documents your configuration choices, and can be used as the starting point for a later customization using the Customize My Subset option on the Welcome screen.

6.5.1. Input Options

This tab allows users to indicate the location of required directories, the configuration file, and the input and output directories.

For the initial installation, NLM Data File Format must be selected.

If you are customizing an existing subset, use Browse to select its current format of either Original Release Format or Rich Release Format.

6.5.2. Output Options

6.5.2.1 Select Output Format

Select either Original Release Format or Rich Release Format. Rich Release Format is the default selection for the initial installation and for customizing an existing subset in the Rich Release Format. Original Release Format is the default for customizing an existing subset in the Original Release Format.

Note: You cannot generate a correct Rich Release Format subset from Original Release Format.

6.5.2.2 Subset Folder

Indicate where the new subset files should be placed.

6.5.2.3 Write Database Load Scripts

Outputs a load script in either Oracle or MySQL format, which you may further optimize or customize. Click here for more information on UMLS load scripts.

6.5.2.4 Source Abbreviation Format

Source vocabulary information in the Metathesaurus content can be identified by a versionless, or Root Source Abbreviation (RSAB), or by the longer and more descriptive Versioned Source Abbreviation (VSAB). The default is the RSAB, but you may choose to include the VSABs. For example,

MSH Root Source Abbreviation (RSAB)
MSH_2003_12_12 Versioned Source Abbreviation (VSAB)

In either case, your subset will include the MRSAB file which links the RSABs to the corresponding VSABs for all source vocabularies in your subset.

6.5.2.5 Maximum Field Length

Restrict fields in your output to the maximum field length allowed in your application or database software. Beginning with the 2007AA Release the default value for this field is 3990 characters.  

6.5.2.6 Eliminate Extended Unicode Characters

This option allows you to select output encoding in either 7-bit ASCII or UTF-8. 7-bit ASCII is the default output from MetamorphoSys. Select this box to output data in Unicode UTF-8 format. When this box is selected extended UTF-8 chars are removed from the following files (in both RRF and ORF):

See 6.5.2.5.1, UMLS Character Sets, in the next section for more information.

6.5.2.6.1 UMLS Character Sets

The UMLS Knowledge Sources are distributed in Unicode (specifically, in the UTF-8 encoding of the Unicode 4.0 standard [1]) to avoid complexity and information loss.

Unicode is a single unified and interoperable global standard, which includes the characters needed to write in any language (see www.unicode.org). Unicode also includes diacritical marks, ideographs, and scientific and other symbols. Most modern systems already use Unicode; we strongly encourage users to upgrade to Unicode compliant systems and software.

The 7-bit basic ASCII character set is the 'least common denominator' character set of 96 characters and symbols from the oldest ASCII standard. UTF-8 is identical to the ASCII encoding for characters in the 7-bit ASCII range, so that 7-bit ASCII files are automatically a correct subset of UTF-8. This means that sources originally in 7-bit ASCII are unchanged. In the UMLS, the term 'extended characters' to refers to all Unicode characters beyond this 7-bit ASCII subset. All other character sets are converted to, and distributed in, UTF-8.

Note that the UMLS LAT - "Language of Term(s)" - is the language the source declares. Since the world does not speak or write in 7-bit ASCII, sources often include extended characters for symbols or from other languages, for example in eponyms.

The MetamorphoSys default is to output all records and data in standard UTF-8. Checking the option to Remove Records Containing Extended UTF-8 Characters will exclude all terms and other data that contains extended characters from your subset. This will create gaps in the hierarchy and may cause loss of vocabulary which matters to your application.

For most English or Spanish sources, i.e., LAT = ENG or SPA, an equivalent 7-bit ASCII string is created for the UMLS to help users of older systems. If you wish to use them, these forms must not be excluded from your subset. These forms are created by the Lexical Variant Generation (LVG) program. This program may be of interest to those who wish to do further conversions; it converts extended characters to an escaped form of the official Unicode character name to ensure that no information is lost. These names may not be "reader friendly" but are useful for some purposes such as indexing.

The initial byte order mark (BOM) character is not present in the UTF-8 encoded Metathesaurus files unless the option Add UTF-8 BOM Characters to Output Files is selected on the Output options tab in MetamorphoSys.

Files will be in byte sort order (for example, with data in UTF-8, standard UNIX sort works as expected). Note that the UMLS data are intended to be manipulated with software tools such as database systems, so the sort order of the files should not matter.

6.5.2.6.2 LVG Flow

LVG is a set of tools and data that are distributed with the UMLS as part of the SPECIALIST system. The current version of LVG includes flows to convert UTF-8 strings into a canonical 7-bit representation that includes the removal of diacritics, expansion of ligatures and the substitution of official Unicode character names with appropriate escape character sequences for the remaining Unicode characters [2].

6.5.2.6.3 MetamorphoSys Support

MetamorphoSys is Unicode compliant. By default, it will eliminate rows that contain extended characters (those not in the 7-bit ASCII range). Note that some English language sources may contain Unicode characters in names and attributes.

6.5.2.6.4 OS and Database Support

Most modern Operating Systems are Unicode (and UTF-8)-aware. For example, Solaris 2.9, Windows XP, and most Linux systems can store, process, and display information that is encoded in UTF-8, though the task of migration may not necessarily be painless.

Database vendors are also starting to migrate to UTF-8, but understandably often lag the OS vendors. In our experience, Oracle and MySQL (versions 4.1 and up [3]) seem to work correctly.

Third party software may not always work correctly with Unicode data. Check with your vendor or software provider.

6.5.2.6.5 References

  1. The Unicode Standard 4.0, Unicode Consortium, Addison-Wesley, http://unicode.org
  2. Lexical Variant Generation http://umlslex.nlm.nih.gov
  3. MySQL Documentation, Chapter 8, National Character Sets and Unicode, http://www.mysql.com/doc/en/index.html

6.5.2.7 (DELETED) Exclude MRCXT (MRCXT.RRF)

Prior to the 2006AA release, users could exclude the very large MRCXT or MRCXT.RRF file from their output, reducing MetamorphoSys processing time, and significantly reducing the size of the resulting subset. As of 2006AA, the file MRCXT is not created by default. Users can create this file using the MRCXT Builder. This program is accessible from the Advanced menu on the MetamorphoSys Welcome screen.

See Section 2.7.1.3.11 for information about how MRHIER.RRF can be used to compute hierarchies.

6.5.2.8 Remove MTH Only Concepts

Select this option to retain MTH atoms ONLY when they overlap with atoms from other sources in your subset.

6.5.2.9 Calculate MD5 Values for Output Files

When this box is checked, the MD5 algorithm is used to generate a "mmsys.md5" file in the Metathesaurus subset directory. The information in this file can be used to verify data integrity of the Metathesaurus files (RRF or ORF), and can be useful when troubleshooting problems. The MD5s values appear in the META/mmsys.md5 file. Please note that these MD5s are intended for comparison of different runs and are calculated in a platform independent way, i.e., they ignore differing line terminations. For this reason, native MD5 calculation programs may differ from those in the mmsys.md5 file.

6.5.2.10 Add UTF-8 BOM Characters to Output Files

When this box is checked, all output data files are prepended with a byte order mark. This beginning-of-file marker (3 bytes) indicates that the file is encoded as UTF-8.

6.5.3 Source List

The Source List tab displays all source vocabularies in the current version of the Metathesaurus. Sources are sorted alphabetically by Source Abbreviation in the default display. At the top of the Source List tab there are two radio buttons:

source list options

The highlighted sources reflect the default subset selected earlier in the installation process. You may select or deselect additional sources to include or exclude from your subset. Leave the button set to 'Select Sources to EXCLUDE from subset' in order to highlight sources that will be removed from your customized Metathesaurus subset.

Or you may choose 'Select sources to INCLUDE in subset'. When selected, only the highlighted sources will be included in your local subset.

Note: The highlighted sources do NOT change when you switch between these two options. If a source is highlighted for EXCLUSION from a subset, and you choose "Select sources to INCLUDE in subset ," that source will now be highlighted for INCLUSION in your subset.

To select or deselect additional rows, hold down the <CTRL> key while making your selection.

You may sort the Source List by Full Source Name, Source Abbreviation, Source Family, Language, or Level (UMLS License Restriction Level). Click on the column header to re-sort the list by that data.

The complete Metathesaurus contains over 100 source vocabularies and in its entirety is an extremely large and unwieldy set of data files. Carefully consider what sources will contribute useful data to your application, and then exclude other sources, to reduce the size of output subsets and to improve application performance.

Consider also that the data from some sources may be incompatible with your intended application. They may contain terms that are recognizable only within the context of a specific source; or they may contain abbreviations that are confusing, or not particularly useful to your application.

Additional information on a few specific sources is available under Metathesaurus Source Vocabularies. You may also contact the source providers included in the Appendix to the License Agreement for additional documentation or information.

You may select individual sources to remove based on the Full Source Name or Source Abbreviation. You may take advantage of groups of related vocabularies, called Source Families, to assist in the removal of related sources when one source is selected.

Note, for example, that CPT (the AMA's Physicians' Current Procedural Terminology, CPT4) is also a part of HCPT (the Health Care Financing Administration Common Procedure Coding System, HCPCS). Both vocabularies must be removed to exclude all sources of CPT information.

You may also exclude sources by language, or by license restriction level. To reset source selections and return to the default list, select Reset Sources to Exclude Defaults under Reset on the menu bar.

6.5.4 Precedence

The Precedence tab displays the default order of precedence of Metathesaurus source and term type combinations as determined by NLM. One string from one English term is designated and labeled as the default preferred name of each concept in the Metathesaurus. Selection of the default preferred name for any Metathesaurus concept is based on an order of precedence of all the types of English strings in all the Metathesaurus source vocabularies. Different types of strings, e.g., preferred terms, cross references, and abbreviations from each vocabulary will have different positions in this order.

The default order of precedence determined by NLM will not be suitable for all applications of the Metathesaurus. MetamorphoSys can be used to change the selection of preferred names to feature terminology from the source vocabularies most appropriate to particular user populations.

You may reorder the ranking of source and term type combinations by cutting and pasting, or dragging and dropping, the rows in the Precedence List. Term types from sources that have been excluded on the Source List tab will not be displayed.

Shift rows by cutting and pasting the rows. Multiple rows can be cut by holding the <CTRL> key down while making selections. To paste the rows, select the location where the rows will be pasted and press <CTRL-V>.

The ranking of sources and term types will affect the output subset. In particular, the name of a concept will be determined by the highest ranking term type in that concept.

6.5.5 Suppressibility

The Suppressibility tab displays source/term type combinations to be marked as suppressible in the output subset. Term types from sources that have been excluded on the Source List will not display. For a new subset, the initial display highlights default source/term types made suppressible by NLM. You may select or deselect source/term types to be marked as suppressible in your output subsets. When customizing an existing subset, the initial display highlights your suppressibility settings for that subset. For advanced suppressibility options see 6.8.3

6.6 File Menu

6.6.1 Enable/Disable Filter

This option allows you to enable any one or all of seven additional filters: Attribute Type List, Content View Filter, Languages to Exclude, Relationship Type List,Semantic Types List, and Source Term Type List.

When a filter is enabled, its corresponding tab appears on the UMLS Metathesaurus Configuration screen. When a filter is disabled, its tab disappears.

6.6.1.1 Attributes Type List

6.6.1.2 Languages to Exclude

6.6.1.3 Relationship Type List

6.6.1.4 Semantic Types List

6.6.1.5 - Removed

6.6.1.6 Content View Filter

A content view is any definable subset of the Metathesaurus that is useful for some specific purpose. The actual definition of a content view can take a variety of different forms:

  1. An actual list of Metathesaurus Unique Identifiers (UIs), maintained over time
  2. A list of sources that participate in the view
  3. A complex query that identifies particular sets of data

The first Content View Flag (CVF) to be made available identifies sets of terms that are useful for Natural Language Processing. This CVF represents the strict model of data used by MetaMap. These terms carry the value "256" in the CVF field.

6.6.1.7 Source Term Types

6.6.2 Import Filter

This command allows the user to import filters developed according to the Filter API. Filters cannot be exported or removed from the application, but they can be disabled. A window will pop up with all filters available for import. These filters are found in the METAMSYS/ext directory. See Section 6.11 for more information.

Two simple import filters are provided as examples of custom filtering:

NosNec (for Testing): To exclude "NOS" or "NEC" strings from the output subset
OddEven (for Testing): To exclude odd or even numbered CUIs from the output subset

When an import filter is selected, its option tab appears on the Metathesaurus configuration screen.

6.6.3 New Configuration

Use this command to create a new subset configuration. The License Agreement Notice is displayed (see Section 6.3.1.1.1.) and the configuration process continues as described in Section 6.3 and the following sections.

6.6.4 Open Configuration

Use this command to open a previously saved configuration, which can be run (go to Done, and then Begin Subset) or modified. MetamorphoSys displays the config directory in the MMSYS folder as a starting point from which to locate and select a previously saved configuration.

6.6.5 Save Configuration

Use this command to save the current configuration. MetamorphoSys prompts the user to assign a file name and displays the top level UMLS directory as a starting point for storing the saved configuration file. This allows a user to save a configuration and run it to produce the Metathesaurus subset at a later time. The saved configuration can also be further modified to create new subset configurations.

6.6.6 Exit

Use this command to exit MetamorphoSys. A prompt provides an opportunity for the user to save the configuration before exiting.

6.7 Edit Menu

Two commands, Increase Font and Decrease Font, allow the user to change the text size displayed in MetamorphoSys screens. An additional command, Undo Enable Filter, is available if any filters have been enabled from the File menu.

6.8 Options (for Advanced Users)

Advanced options include MetamorphoSys Options, Advanced Source List Options, and Advanced Suppressibility Options.

6.8.1 MetamorphoSys Options

Opens a configuration window which contains the following user capability:

Auto Select Related Items - If this check-box is selected, there is no prompt when the selected row shares a Source Family or has a Dependent Source. The system automatically selects the Dependent Source rows or the rows with the same Source Family. The default for this flag is false.

6.8.2 Advanced Source List Options

Opens a configuration window which contains the following user capabilities:

6.8.2.1 Enforce Family Selection

If Enforce Family Selection is selected, you will be prompted to select other sources in the same Source Family.

6.8.2.2 Enforce Dependent Source Selection

If Enforce Dependent Source Selection is selected, and you select a source in the Dependent Source Associations table, you may select any dependent sources listed. As with Enforce Family Selection, this function exists for deselection of sources as well. The default for this flag is true.

This selection also provides the following capabilities:

6.8.3 Advanced Suppressibility Options (Remove Suppressible Data)

Users may now specify which of 3 types of suppressible data to exclude from their customized subsets

  1. Source Term Type: groups of terms are marked suppressible by Source/Term Type.
  2. Editor Assigned: specific terms are marked suppressible by Metathesaurus editors.
  3. Obsolete: terms identified as Obsolete in their source vocabularies.

Rich Release Format (RRF) If Remove Source Term Type suppressible data is selected, data with a SUPPRESS flag set to Y will be removed. If Remove Editor Assigned suppressible data is selected, data with a SUPPRESS flag set to E will be removed. If Remove Obsolete data is selected, data with a SUPPRESS flag set to O will be removed.

Original Release Format (ORF) In ORF, all three types of suppressibility are represented by ts='s' or 'p.' Thus, selecting only one or two options above will result in the a subset that still contains some terms where ts='s' or 'p'.

See also Suppressibility, Section 6.5.5

6.8.4 Advanced Semantic Types To Exclude

These options are available when the Semantic Types to Exclude filter has been enabled from the File menu and allow you to set the predicate for concept removal. There are two choices:

  1. Remove CUIs containing at least one selected Semantic Type - If this option is selected, a concept will be removed if any of its Semantic Types appear on the exclude list.
  2. Remove CUIs containing only selected Semantic Types - If this option is selected, a concept will be removed only if all of its Semantic Types are on the exclude list.

6.9 Reset Menu

The Reset menu allows you to return to Metathesaurus default selections for all of the filter tabs (Input Options, Output Options, Source List, Precedence and Suppressibility). The choice of version, Original Release Format or Rich Release Format, will not be reset on the Output Options tab and the Input options tab. The default selections are those listed in the mmsys.prop.default file in the config folder. The mmsys.prop.sav file contains the properties used in the last run of MetamorphoSys.

6.10 Done; Begin Subset

When all options have been explored and you have completed configuring your Metathesaurus subset, select Done from the menu bar, and then Begin Subset. If you would prefer to save your configuration in order to subset at a later time, select Save Configuration from the File menu.

The Install UMLS Metathesaurus progress monitor charts the process through the following steps: Initializing the CUI list; Subsetting Content, Subsetting Indexes, and Final Processes. To stop processing and exit MetamorphoSys at any time, press Cancel at the bottom of the progress monitor. The interrupted process cannot be resumed. The configuration must be recalled (if saved), or recreated (if not saved), and subsetting must be started again.

MetamorphoSys produces an install.log file in your release directory, containing the log of the installation process up to the start of Metathesaurus subsetting. It records which operations were selected, and reports the results of file validations against both CHK and MD5 files. If the downloaded files pass validation, processing continues and subsetting begins. If files fail validation, the install.log is displayed.

When subsetting is complete, progress and error messages and the configuration settings are displayed on the screen and also written to a log file called mmsys.log in the directory containing the subsetted files. The subsetted Metathesaurus files are located in the chosen destination directory (see Section 6.3.1.1).

6.11 RRF Browser

The RRF Browser allows you to quickly find a term within your customized Metathesaurus subset or any vocabulary in the RRF format. You can also review the entire subset or vocabulary.

There are three ways to access the RRF Browser:

If you do not use the third option above, you must select a Metathesaurus subset (previously created using MetamorphoSys) to begin the browsing session. All options will be grayed out until a Metathesaurus subset is selected. Use the File tab - Open Subset command to open a subset.

Help is available on:

File tab
Edit tab
Options tab
  Word search options
  Report view options
  Tree Browser options
  Raw View Options
  Restrict Searches and Views
Main buttons
View window
CUI search
Code search
Word search
Tree Browser view
Error messages

6.11.1 File Tab

the file tab

Open Subset opens the Select and Open the Subset Folder window. Use this command to select a Metathesaurus subset (previously created using MetamorphoSys) at the beginning of the browsing session.

Print prints the Report or Raw view, depending on the tab selected.

Print Preview allows you to check the length of a report prior to printing. (Some concepts, reports, or searches may be long.)

Exit closes the program.

The first three options can also be accessed using the main buttons.

6.11.2 Edit Tab

edit tab image

Find allows you to locate a string in the concept report currently displayed in the Report or Raw View. Users can also access this feature by pressing Shift + F.

find menu

Users can search for a a string using the Text to Find box. The matching string is highlighted in the Report or Raw View.  Note that only text that is currently displayed will be searched, i.e., content in unexpanded nodes will not be searched.

6.11.3 Options Tab

options tab

The Options tab contains three items: Tree Browser Options, Word Search Options, and Report View Options.

6.11.3.1 Tree Browser Options

tree browser options

Maximum Number of Children indicates the maximum number of terms that will be displayed under each vocabulary when the '+' sign next to the vocabulary name is clicked in the tree browser view. If more children terms exist than the current maximum number of children setting, a warning will occur when the '+' is clicked.

6.11.3.2 Word Search Options

word search options

You can change the granularity of matching by moving the slider bar toward "More" for a more generalized search or toward "Fewer" for a more specific search. For example, searching with the term "Heart" and the slider closest to "More" produces three results Heart, Congestive Heart Failure, and Cardio-vascular Findings: Heart. Searching for the same term with the slider closest to "Fewer" produces one result, Heart.

Select a Language restricts the string search and the results displayed to a specific language. For example, if Spanish is the language selected and the string "quie" is entered, only Spanish terms with this string will be returned. The English term "quiet" would not be returned.

6.11.3.3 Report View Options

Report view options include

report view options

Select from the list of sources to display information for that source only. By default all source information included in a subset is displayed. To select more than one source hold down the Control button when clicking on the source name.

Select from the list of languages to display strings for that language only. To select more than one language hold down the Control button when clicking on the language.

Select from the list of atom sorting algorithms to change the sort order of atoms.

6.11.3.4 Raw View Options

This option allows users to set the number of records returned for each RRF file in the raw view. This feature is best used for concepts with a large number of relationships or co-occurrences, such as C0000039. For this CUI there are 632 rows in MRCOC.RRF. When the page size is set to 50 and MRCOC is expanded, users will see options of 'all', 'prev', and 'next' along with an indication of what records they are currently viewing 'records 1 to 50'.

6.11.3.5 Restrict Searches and Views

Users can now filter the results of their searches by Semantic Type, content view, or source by selecting 'Restrict Searches and Views' from the Options drop down menu. When restricting by content view: check the box of the content view you wish to use as a filter, then select 'Hide' (to remove those terms) or 'Highlight' (to highlight those terms) from the Options menu. The Hide and Highlight functions are the same for the Semantic Type and source options; however to select a Semantic Type or source, users should click on the name of the Semantic Type or source they desire. A demonstration of this new feature is provided at http://www.nlm.nih.gov/research/umls/quickto urs/restrictor tutorial.htm.

6.11.4 Main Buttons

image of the 

main buttons

The RRF Browser contains five main buttons. Buttons are inactive when their functions are not available.

open button
The Open Subset button opens the Select and Open the Subset Folder window. Use this button to select a subset at the beginning of the browsing session.
print 

button
The Print button prints the Report or Raw view, depending on the tab selected.
print preview 

button
The Print Preview button opens a print preview of the currently selected view. You may use the print preview feature to check the length of a report prior to printing.
back button
The Back button changes the information in the view window to the last CUI viewed.
forward 

button
The Forward button advances forward through previously displayed reports in order, up to the current search.

6.11.5 The View Window

view tabs

The RRF Browser provides two views of the subset data. Report View and Raw View tabs appear on the top of the viewing window. Select a tab to indicate the desired view.

Report View is more user friendly and displays clearly labeled data including atoms and relationships. Within the Report view, click on the plus sign to expand and the minus sign to collapse.

Note: The information after each Atom in the report view is AUI|SAB|TTY|CODE.

report view

Raw View displays the data organized by the Metathesaurus relational files and in ASCII (or UTF-8) character encoding. The data columns in Metathesaurus file rows are separated by the pipe " | " symbol. Metathesaurus file content and organization is detailed in Section 2.

Within the Raw view, Metathesaurus file data rows can be displayed or hidden by clicking on the plus or minus sign to the left of the file name.

There are 3 sections of data displayed in this view: 1) the data for the concept, 2) the historical data from MRCUI and MRAUI for terms that now map to this concept and 3) full listings from metadata files that are valid for all concepts in the Metathesaurus, e.g., MRDOC, MRFILES, etc.

raw view

6.11.6 CUI Search

CUI search

Use the CUI Search tab to search the subset for a specific CUI. It is not necessary to enter the beginning C and any beginning 0s. For example, entering 733 and entering C0000733 will produce the same results.

6.11.6.1 Code Search

CUI search

Use the code search tab to search the subset for a specific code. The code can be from any vocabulary, terminology, or code set in your subset. For example, entering 'D003924' in a subset containing MeSH terms will return C0011860, Diabetes Mellitus, Non-Insulin-Dependent. Search by code is case sensitive. Using the same example and searching for 'd003924' will return an error "No concepts matched the search code."

6.11.7 Word Search

word search screen

Enter a word into the search box to find it in the subset. Search terms can be as small as one letter or as large as a multi-word phrase.

Use the * (asterisk) as a truncation operator to find all words that begin with the query term. For example, searching for heart* will return heart, heart attack, heart urchin, and fetal heart. The asterisk performs right truncation searches only. Asterisk (*) is not a wildcard. You cannot, for example, search for *tion, or colo*r.

The multi-word search uses a Boolean AND operator, thus only concepts that contain strings with all of the search terms will be displayed. For example, using the multi-word search phrase "heart fetal" will return concepts that have both the terms "fetal" and "heart". It will not return concepts that have "heart" but not "fetal" or "fetal" but not "heart".

Note that Maximum Number of Results is set to 50 as the default. Increasing this amount increases search time and the chance of finding a particular term.

After entering a search term, click the search button. The results of the search will be displayed in the window beneath the search term box. If a search does not produce any concepts, the results from the previous query will remain in the window beneath the search term box. Double click on the name of the concept to display its information in the View window. The Word Search feature searches atoms; other Metathesaurus content (e.g., attributes) is not searched.

word search with asterisk

6.11.8 Tree Browser View

tree browser image

The Tree Browser View allows you to view the vocabularies included in their subset arranged in a hierarchal order.

Click on the plus sign to expand a tree view.

Click on the minus sign to collapse the tree view.

Click on the name of the concept to display its information in the View window.

6.11.9 Error Messages

Invalid source path
The file path or source path is file name that you selected on the Select and Open the Subset Folder screen. You will receive this error if you have not selected a directory that contains a MRFILES.RRF file. This browser does not support Original Release Format (ORF); you must select a RRF subset folder to use this browser.

6.11.10.1 Issues That Do Not Produce an Error Message

RRF Browser slows or locks when opening a large concept
When searching very large concepts, the Browser may take a considerable period of time to gather the data. If you wish to stop this process use the Windows Task Manager to close the RRF Browser. Before opening the browser delete the dir_saved.txt file that is written to the MMSYS/config directory. If you answer "Yes" to the initial prompt "Do you want to load the most recently viewed concepts" and do not delete this file, the RRF Browser will continue to attempt to open the large concept when it is restarted.

Japanese characters do not display correctly in the report and raw views
In the Linux JRE 1.4 environment, East Asian fonts may not appear correctly. This issue will be resolved when the JRE is changed to version 1.5.

Windows users should ensure the ARIALUNI.TTF font file exists in your Windows/Fonts directory. If not, users will need to install the font (comes with MS Office). The instructions below on how to install were taken from the Microsoft Web site.

To install the Arial Unicode MS font, follow these steps:

  1. Click Start, point to Settings, and then click Control Panel.
    Note: In Microsoft Windows XP, click Start and then click Control Panel.
  2. In Control Panel, click Add/Remove Programs.
  3. Do one of the following:

    * In Microsoft Windows 98, Microsoft Windows Millennium Edition (Me), or Microsoft Windows NT 4.0
    On the Install/Uninstall tab, click Microsoft Office XP (or Microsoft Word 2002), and then click Add/Remove.

    *In Microsoft Windows 2000 or Microsoft Windows XP
    Click Change or Remove Programs, click Microsoft Office XP (or Microsoft Word 2002), and then click Change.
    In the Features to install window, click Next.
    Click to expand Office Shared Features.
    Click to expand International Support.
    Click the icon next to Universal Font, and then click Run all from My Computer on the shortcut menu (a CD may be needed for this step).
    Click Update to complete the installation of the Universal Font (Arial Unicode MS) to the computer.

6.12 MRCXT Builder

MRHIER.RRF has replaced the file MRCXT, which is no longer created by default. You are strongly encouraged to replace uses of MRCXT with MRHIER. MRHIER.RRF provides compete hierarchical information in a computable form which is much more compact and customizable for your purposes. MRCXT information is incomplete yet can be massive, increasing the size of the full Metathesaurus by over 66 percent.

If you decide to use the MRCXT builder, carefully consider your selections — for example, some sources have very large numbers of siblings with little relatedness or many, many children. Creating a MRCXT file with a large number of siblings or children can take over 15 hours.

MRCXT Builder enables you to generate and customize MRCXT.RRF for your subsets from the information found in the MRHIER.RRF and MRCONSO.RRF files.

You must select a Metathesaurus RRF subset (previously created using MetamorphoSys) to use the MRCXT Builder.

Help is available on:

Opening MRCXT Builder
File menu
Locating a subset
MRCXT file options
Selecting sources
Error messages

6.12.1 Opening MRCXT Builder

To Open MRCCT Builder:

6.12.2 File Menu

file menu image

The File menu has one option, Exit. Use this option to close the program.

6.12.3 Locating a Subset

MRCXT Builder can only be used with an RRF subset. Use the Browse button to locate the directory where the RRF was saved.

locate a subset

6.12.4 MRCXT File Options

MRCXT builder file options

Build Siblings: Check this option to show sibling relationships in the MRCXT file.

Build Children: Check this option to show child relationships in the MRCXT file.

Use Versioned Source Abbreviations: Check this option to produce a file with versioned source abbreviations, for example, MSH2004, instead of versionless source abbreviations, for example, MSH.

Compute XC Flags: Check this option to compute the values for the XC field. Checking this option results in a "+" in the raw data row in the XC column for the MRCXT file when a concept has one or more child values in this context. For more information see Section 2.7.1.3.12 on Contexts. Selecting this option will increase the amount of time required to build the MRCXT file.

Prepend UTF-8 BOM characters to MRCXT.RRF: Check this option to produce a file with UTF-8 BOM characters.

Limit Contexts to: Check this option and indicate a number in the text entry box to limit the contexts a single atom can have in the MRCXT file. Limiting and/or reducing the number of contexts creates a smaller file and reduces the time it takes to build the file.

6.12.5 Selecting Sources

select sources menu

Highlight sources to build contexts for by clicking on each source name. Hold down the control key and click to select more than one source. Selecting fewer sources will decrease the amount of time it takes to build the MRCXT file. Contexts can only be built for sources that are already present in the subset.

6.12.6 Error Messages

At this point no error messages have been reported for the MRCXT builder.

6.13 API Documentation

The MetamorphoSys API documentation helps you develop custom filters (generated with javadoc) and can be found in the directory MMSYS/doc. This directory is present on the UMLS DVD or created when the file mmsys.zip is extracted.

6.14 Getting Help

Check the information available at http://umlsinfo.nlm.nih.gov. We are developing additional Web resources based on user input.

NLM maintains a listserv (electronic mailing list service) called UMLS-USERS where users can share their experiences with, or seek advice from the UMLS community on using UMLS resources. NLM also uses this forum to seek advice from UMLS users and to distribute official announcements about UMLS products and services.

To subscribe, send an email to listserv@list.nih.gov containing the following message: SUBSCRIBE UMLSUSERS-L <your full name>.

To unsubscribe, send an email to listserv@list.nih.gov containing the following message: SIGNOFF UMLSUSERS-L <your full name>.

To post a message to the list AFTER subscribing, send email to UMLSUSERS-L@list.nih.gov.

To access subscription information and list archives, go to UMLSUSERS-L Listserv Webpage.

An alternative list, UMLS-ANNOUNCES-L, exists for users who wish to receive only official announcements about UMLS products and services, including new releases, new features, and problem/fix messages.

To subscribe, send an email to listserv@list.nih.gov containing the following message: SUBSCRIBE UMLS-ANNOUNCES-L <your full name>.

To unsubscribe, send an email to listserv@list.nih.gov containing the following message: SIGNOFF UMLS-ANNOUNCES-L <your full name>.

To access subscription information and list archives, go to UMLS-ANNOUCES-L Listserv Webpage.

6.15 Acknowledgments

Solaris and Windows Java Runtime Environment — http://javasoft.com.

Linux Java Runtime Environment — http://www.blackdown.org.


Previous | Table of Contents | Next

 

Last reviewed: 28 July 2008
Last updated: 28 July 2008
First published: 20 July 2004
Metadata| Permanence level: Permanent: Stable Content
Previous version