Unified Medical Language System | |
This document is a guide to loading an RRF subset of the UMLS Metathesaurus into an Oracle database by using the scripts and files created by MetamorphoSys. To generate, the necessary files and scripts, select the "Write Oracle Load Script" checkbox on the "Output Options" tab.
Recommended Software Version
Oracle 9i Release 2 (Standard Edition, Enterprise Edition) or
Oracle 10g Release 2 (Standard
Edition, Enterprise Edition)
To load your data into an Oracle database, you will
have to either use an existing database or create a new one. The
full
complexity of creating an Oracle database is outside the scope of this
document. However, we have included some discussion regarding
this
process below and we do have a couple of specific
recommendations. In
particular, recommend that you specify UTF8 as
the
CHARACTER SET and
NATIONAL CHARACTER SET and configure the database
to use "character" instead of "byte" semantics by default when
interpreting character field lengths. These settings are
configured through three Oracle parameters:
% setenv ORACLE_SID UMLSDBNote: the SYS password is typically configured when Oracle is installed. Now, you are ready to issue the CREATE DATABASE command. Here is a sample create database command that addresses the character set issues described above.
% sqlplus /nolog
SQL> CONNECT SYS/<sys pwd> AS SYSDBA
SQL> -- create the server parameter file
SQL> CREATE SPFILE='/d1/oracle/dbs/spfileUMLSDB.ora' FROM
SQL> PFILE='/d1/oracle/dbs/init.ora';
SQL> SHUTDOWN
SQL> STARTUP NOMOUNT
SQL> ALTER SYSTEM SET nls_length_semantics='CHAR' scope=BOTH;
CREATE DATABASE umlsdb
USER SYS IDENTIFIED BY
sys_p4ssw0rd
USER SYSTEM IDENTIFIED BY
system_p4ssw0rd
CHARACTER SET UTF8
NATIONAL CHARACTER SET UTF8
LOGFILE GROUP 1 ('/d1/..../redo01.log)
SIZE 500M,
GROUP 2 ('/d1/..../redo02.log) SIZE 500M,
GROUP 3 ('/d1/..../redo02.log) SIZE 500M
MAXLOGFILES 5
MAXLOGMEMBERS 5
MAXLOGHISTORY 1
MAXDATAFILES 100
MAXINSTANCES 1
ARCHIVELOG
EXTENT MANAGEMENT LOCAL
DATAFILE '/d2/....' SIZE 500M
REUSE
SYSAUX DATAFILE '/d2/....' SIZE
500M REUSE
DEFAULT TABLESPACE META
DATAFILE
'/d1/..../meta01.dbf' SIZE 15000M REUSE
EXTENT MANAGEMENT LOCAL
UNIFORM SIZE 10M
DEFAULT TEMPORARY TABLESPACE TEMP
TEMPFILE '/d2/..../temp01.dbf'
SIZE 5000M REUSE
EXTENT MANAGEMENT LOCAL
UNIFORM SIZE 10M
UNDO TABLESPACE UNDO_TBS
DATAFILE
'/d3/..../undotbs01.dbf' SIZE 2000M
REUSE AUTOEXTEND ON MAXSIZE
UNLIMITED
SET TIME_ZONE = 'PST';
One consideration in the create table statement is the size of the main
data tablespace datafile. In the example above, we create a META
tablespace with an initial data file called "meta01.dbf" whose starting
size is 15 GB. We recommend that you either create a very large
data file like this one or configure the tablespace to use
AUTOEXTEND. Ensure your system has enough disk space to
accommodate the full volume of data.
Once the database is created, you should run the admin scripts to
create the data dictionary tables and the structures needed for PL/SQL
procedures. For example,
% setenv ORACLE_SID UMLSDBWith this done, you should now be ready to create a user.
% sqlplus /nolog
SQL> CONNECT SYS/<sys pwd> AS SYSDBA
SQL> @$ORACLE_HOME/rdbms/admin/catalog.sql
SQL> @$ORACLE_HOME/rdbms/admin/catproc.sql
Creating a User
Whether you create a database manually or with the configuration
assistant, the next step is to create your own user. It is very
important that you not load data as the SYS or
SYSTEM user. We recommend that you set the default tablespace for
this
user to the META tablespace you created and set the user quota to
UNLIMITED. After creating a user, the next step is to grant this user
the appropriate
appropriate privileges. For example,
% setenv ORACLE_SID UMLSDB
% sqlplus /nolog
SQL> CONNECT SYS/<sys pwd> AS SYSDBA
SQL> CREATE USER umls
SQL> IDENTIFIED BY umls_p4ssw0rd
SQL> DEFAULT TABLESPACE META
SQL> QUOTA UNLIMITED ON META;
SQL> GRANT CONNECT TO umls;
SQL> GRANT RESOURCE TO umls;
Note: this process can also be performed through a GUI interface
with a tool like Oracle Enterprise Manager.
Configuring Parameters for
Performance
Following are a number of initialization parameter recommendations
designed for an Oracle 10g system to handle the performance load of
your database. This guide is not intended to be exhaustive and is not
guaranteed to increase performance of your specific system.
Please take the time to understand the individual Oracle parameters to
maximize the individual performance of your system.
Windows
If you are running MetamorphoSys in a Windows environment, in your
output directory (after running MetamorphoSys with the "Write Oracle
Load Script" box checked) you should see a batch file script called
"populate_oracle_db.bat". This script can be used to load your
subset
into an Oracle database. The directory will also contain an
SQL script for creating the tables,
an SQL script for creating the indexes, and an SQL*Loader control file
(.ctl) for each of the data files to be loaded.
SQL*Loader is the Oracle tool for loading text files into a database
and has a variety of parameters and configuration options (this too is
outside the scope of this document). The control files used by
the script to load each file into an Oracle database table are designed
to be relatively simple and relatively efficient. If you are
interested in more information, you are encouraged to look up the SQL*Loader Control File Reference
in the Oracle manual.
Edit the script and configure the four parameters at the top of the
script. These are:
set ORACLE_HOME=<path to
ORACLE_HOME>
set user=<username>
set password=<password>
set tns_name=<db_name>
ORACLE_HOME should be set to the installation directory of your
local
Oracle client or server. This may be something like
"D:\oracle\product\10.2.0\db_1". After
configuring these four parameters, this section of your script may look
something like the following.
set
ORACLE_HOME=D:\oracle\product\10.2.0\db_1
set user=myuser
set password=p4ssw0rd
set tns_name=UMLSDB
You are now ready to run the script. To do so, navigate to
your subset directory using the Windows explorer and double-click on
the "populate_oracle_db.bat" file you just finished editing.
Here are some things to keep in mind when using this script:
Linux, Macintosh, or Solaris
If you are running MetamorphoSys in a Linux, Macintosh, or Solaris environment, in your output directory (after running MetamorphoSys with the "Write Oracle Load Script" box checked) you should see a .sh shell script called "populate_oracle_db.sh". This script can be used to load your subset into a Oracle database. The directory will also contain an SQL script for creating the tables, an SQL script for creating the indexes, and an SQL*Loader control file (.ctl) for each of the data files to be loaded.
SQL*Loader is the Oracle tool for loading text files into a database and has a variety of parameters and configuration options (this too is outside the scope of this document). The control files used by the script to load each file into an Oracle database table are designed to be relatively simple and relatively efficient. If you are interested in more information, you are encouraged to look up the SQL*Loader Control File Reference in the Oracle manual.export ORACLE_HOME=<path to
ORACLE_HOME>
user=<username>
password=<password>
tns_name=<tns_name>
ORACLE_HOME should be set to the installation directory of your
local Oracle client or server. This may be something like
"/u01/app/oracle/product/10.2.0". After
configuring these four
parameters, this section of your script may look something like the
following.
export
ORACLE_HOME=/u01/app/oracle/product/10.2.0
user=myuser
password=p4ssw0rd
tns_name=umls
Note: passwords containing $ or # characters may cause problems as
these have special meaning within the shell environment. In this
case,
you may want to enclose the password in quotes in the script itself
(e.g. password="pass#word"). You are now ready to run the script.
To
do so, navigate to
your
subset directory and invoke the script from the command line. It
may be necessary to change the file permissions to make the script
executable before it can be started. For example:
% cd <subset directory>
% chmod 775 populate_oracle_db.sh
% populate_oracle_db.sh &
Here are some things to keep in mind when using this script:
Sample Queries
We have provided a number of sample queries to demonstrate how to access certain kinds of data.
Get the English preferred name for a CUI.
SELECT str FROM MRCONSO
WHERE cui = ? AND stt = 'PF'
AND ts = 'P' AND ispref = 'Y' AND lat = 'ENG';
Get MRCONSO rows (atoms) for a SAB and CODE, in order of precedence
SELECT * FROM MRCONSO
WHERE sab = ? AND code = ?
ORDER BY cui, lat, lower(ts), lui, stt, sui, ispref desc;
Get all MRCONSO rows (atoms) for a CUI, in order of precedence.
SELECT * FROM MRCONSO
WHERE cui = ?
ORDER BY cui, lat, lower(ts), lui, stt, sui, ispref desc;
Get all MRCONSO rows (atoms), with STYs, for a CUI, in order of precedence.
SELECT a.*, sty FROM MRCONSO a, MRSTY b
WHERE a.cui = ? AND a.cui = b.cui
ORDER BY cui, lat, lower(ts), lui, stt, sui, ispref desc;
Get all MRCONSO rows (atoms) for a particular STY, in order of precedence by CUI.
SELECT a.* FROM MRCONSO a, MRSTY b
WHERE sty = ? AND a.cui = b.cui
ORDER BY cui, lat, lower(ts), lui, stt, sui, ispref desc;
Perform a norm string lookup in MRXNS_ENG to get MRCONSO rows.
SELECT a.* FROM MRCONSO a, MRXNS_ENG b
WHERE a.cui = b.cui and nstr = ? and a.sui = b.sui;
Find all Semantic Types for a particular MRCONSO string.
SELECT sty FROM MRCONSO a, MRSTY b
WHERE a.cui = b.cui AND str = ?;
Find all contexts for an atom.
SELECT * FROM MRHIER
WHERE cui = ? AND aui = ?
ORDER BY cui, aui, sab, cxn;
Find all definitions for all concepts containing a particular string.
SELECT def FROM MRCONSO a, MRDEF b
WHERE a.cui = b.cui AND str = ?;
Find all definitions for a particular string
SELECT def FROM MRCONSO a, MRDEF b
WHERE a.aui = b.aui AND str = ?;
Find preferred MUI of a MeSH D#.
SELECT scui FROM MRCONSO
WHERE sab = 'MSH' AND tty = 'MH'
AND sdui = ?;
Last reviewed: 05 July 2007
Last updated: 05 July 2007
First published: 05 July 2007
Metadata| Permanence level: Permanent: Dynamic Content