Last Modified: March 30, 2007
MMTx was designed to provide the general public with access to the MetaMap algorithms and capabilities. The original MetaMap program was designed using the Prolog language which limited the platforms that are available for MetaMap to be ran on. With this in mind, we decided that the MMTx program would be built using the Java language to allow it to run on as many platforms as possible.
We have worked with the author of the MetaMap program to ensure a faithful reproduction of the original algorithms and list of available options. You should in most cases receive identical output from both MetaMap and MMTx. There are known differences due to the fact that MetaMap uses the Xerox Parc POS Tagger, while MMTx uses (by default) built-in algorithms for tagging text. This difference in tagging may on occassion provide different results in the two systems.
MMTx is a Java-based application and theoretically is platform
independent. Having said that, we are officially supporting the
following platforms:
The Data File Builder workspace will require disk space about five times the size of your knowledge sources or at least 8.6GB.
We will accept emailed trouble/bug report messages sent to the mmtx@nlm.nih.gov address, but, would REALLY, REALLY prefer you to use our "Trouble Reporter" system which is accessible via the link in the left sidebar or from this link here: Trouble Reporter
Once we receive a bug/trouble report it is entered into our tracking
system which is available via the "Review Status of Trouble Reports"
link in the left sidebar or from the following link:
Review
Status of Trouble Reports
Reports submitted via the
"Trouble
Reporter" are automatically entered into the tracking system
after they have been reviewed by our Trouble Report moderator.
Bug/Trouble reports submitted via email are entered into the
tracking system by the Trouble Report moderator as time permits.
Once your Trouble Report is submitted and reviewed, it is assigned
to one of our MMTx Team Members to work. You will be contacted
via email directly when further clarification is required, or
when the status of your Trouble Report has changed.
Trouble Reports or TRs are only closed once we have
received feedback from the submitter that the problem has in fact
been resolved. One caveat is that if we have asked for feedback
on a fix and have not received any response for a week, we will
consider the TR closed.
Currently, we are bundling an initial dataset for your use in the
MMTx. This dataset is based on information contained in the
Unified Medical
Language System (UMLS)
and must comply with all of the
UMLS copyright restrictions.
So that we can honor these restrictions, you must have satisfied the
following criteria prior to receiving access to the MMTx
Download page:
There is currently no interface to the Brill POS Tagger from
MMTx. We have been looking into implementing this, but, no
decision has been made whether to support the tagger or not.
The main hurdle here is supporting the TreeBank tags that are
produced via the Brill POS Tagger versus the current suite of tags
we are using internally with the Xerox PARC Tagger.
Please see Notes on Tagger Integration for more information on how you can integrate the Brill
Tagger into the MMTx.
We are currently in the process of completing this and should have a web page available soon.
There is currently no defined way of doing this within MMTx. If you know SQL and MySQL, you will be able to modify the tables very easily using the MySQL provided tools.
hbr>If you need to install the Moderate and/or Relaxed Data Model databases either due to later download or error on the part of the install script - please reference the following link for details: Loading Optional Models.
Currently there is no formal uninstall program available for MMTx. We are looking to incorporate one in an upcoming release. For now, the best set of steps to follow are the following:
use mysql; delete from user where user='mmtxUser'; delete from user where user='lvg'; flush privileges;
You can reference additional information on MetaMap and SKR from the following link: MetaMap and SKR Research Information
The following UMLS Sources are excluded from MMTx distribution:
CDT5 Current Dental Terminology 2005 (CDT-5), 5 CPT01SP Physicians' Current Procedural Terminology, Spanish Translation, 2001 CPT2005 Physicians' Current Procedural Terminology, 2005 HCDT5 HCPCS Version of Current Dental Terminology 2005 (CDT-5), 5 HCPCS05 Healthcare Common Procedure Coding System, 2005 HCPT05 HCPCS Version of Current Procedural Terminology (CPT), 2005 MTHCH05 Metathesaurus CPT Hierarchical Terms, 2005 MTHHH05 Metathesaurus HCPCS Hierarchical Terms, 2005All other sources are included.
The following information comes from one our MMTx users (Leon F.) who
was experiencing problems downloading the MMTx jar file because of it's
size. I've also been able to replicate their success in using the
software so I can highly recommend it. GNU's Wget program which is
accessible via the following link:
http://www.gnu.org/software/wget/
The following is from the GNU Wget page:
GNU Wget is a free software package for retrieving files using HTTP, HTTPS and FTP, the most widely-used Internet protocols. It is a non-interactive commandline tool, so it may easily be called from scripts, cron jobs, terminals without X-Windows support, etc.
GNU Wget has many features to make retrieving large files or mirroring entire web or FTP sites easy, including:
* Can resume aborted downloads, using REST and RANGE
* Can use filename wild cards and recursively mirror directories
* NLS-based message files for many different languages
* Optionally converts absolute links in downloaded documents to
relative, so that downloaded documents may link to each other locally
* Runs on most UNIX-like operating systems as well as Microsoft Windows
* Supports HTTP and SOCKS proxies
* Supports HTTP cookies
* Supports persistent HTTP connections
* Unattended / background operation
* Uses local file timestamps to determine whether documents need to be re-downloaded when mirroring
* GNU Wget is distributed under the GNU General Public License.
So, once you have wget downloaded and installed, you can download the MMTx
software by using the following command:
wget --http-user=USERNAME --http-passwd=PASSWORD http://mmtx.nlm.nih.gov/Download/mmtx_V2.4.B_data.jar
It should be noted that this does involved having your username and password
visible from the command during the download process as well as from the
process information so care should be taken.
The machine output format (-q) is the only currently supported
machine readable/parsable output format we offer in MMTx. See
the FAQ section entitled
"Can you explain what the
machine output format is and what information is included?"
for more details on machine output.
Creating piped output is easy except for the issue of repeating
information (e.g., phrases, candidates, mappings). A modified version
of machine output might work. And like machine output, piped output
of an utterance will be extended over several lines with each line
beginning with the utterance id and output type (and maybe other
identifying information).
This is being reviewed for inclusion in an upcoming release.
MMTx currently does not support XML output of any kind. It should be
fairly easy to modify the output routines to include XML output if you
download and modify the sources.
XML output is not currently scheduled to be included in any future
release of MMTx.
MMTx has several options that appear to work backwards when specified
on the command line. This is a feature designed into the MMTx system
to provide conformance with the MetaMap program (or in other words -
for historical reasons). The following options are considered toggle
options because when you specify them on the command line, they
actually turn the option OFF instead of ON like you would expect.
The machine output format (-q) is the only currently supported
machine readable/parsable output format we offer in MMTx.
There is a lot of information contained in the machine output
and the following documents outline the contents in great depth:
Well ..., from Version 2.0.C on, you will be able to tell simply
by running mmtx --version from the command line.
With previous versions, you will need to look at the date of
the mmtx/classes/programs/MMTx.class file.
This error message occurs when you are using the new version of
Java - 1.4. We have a work-around that allows you to run MMTx
in Java 1.4 without getting this error. The work-around is
contained in V2.0.C of MMTx.
Currently in MMTx, if you are making changes to the source code
and trying to recompile in Java 1.4 - you will receive errors and
be unable to complete the compile process. We are working on
bringing MMTx into conformance with Java 1.4 and should have
something out soon. Currently, MMTx ONLY supports
compilation under Java 1.3.
This error message occurs when you are using large input files with MMTx. You simply need to add memory sizing to the MMTx script in the mmtx/bin directory.
Add in the following options to the java call: -ms100m <default is 4MB, this changes it to 100MB> -mx100m <default is 16MB, this changes it to 100MB> So, the beginning of the java line in the script should look like the following: java -ms100m -mx100m -cp You can play around with the 100s to see what works for you. The only caveat is that -ms must be less than or equal to -mx, it can't be greater than -mx.
For example:
(1) "Fig." inside this text should not be marked as "Fig [Food]": This condition is called a succenturiate lobe (Fig. 6-5 ) and may be problematic if that lobe of placenta is inadvertently left within the uterus at the time of delivery. (2) Likewise, "al" shouldn't be "Aluminum [Element, Ion, or Isotope]" These studies provide strong evidence of the interconnectiveness of maternal and fetal fluid spaces across the membranes and placenta (Kilpatrick et al, 1991).The quick answer to your question about filtering is that currently there is no way to do so. You can specify the -u (--unique_acros_abbrs_only) or -a (--no_acros_abbrs) options; but these options only prevent the generation of some variants before accessing the Metathesaurus. If the abbreviation is in the Metathesaurus, itself, (as is the case with "al") then the options don't help. MMTx doesn't get your first example right either because it doesn't realize that "Fig." *is* an abbreviation. (BTW, I normally use both -D (--an_derivational_variants) and -a (no_acors_abbrs) in my processing; the -D option allows derivational variants only between adjectives and nouns, and the -a option filters out all abbreviatory variants.)
C....... occurs in some data for optimization purposes. As an
example, the line
abdomen|S0003328|C.......
in a word index file means that the word "abdomen" occurs in the
string S0003328 ('Palpation of abdomen'). The presence of C.......
signals that the given string is also the concept name. No further
searching is necessary to obtain the concept name. On the other hand,
the entry
abdomen|S0288461|C0000735
means that "abdomen" occurs in S0288461 ('abdomen neoplasm'), a
string for concept C0000735 with preferred name 'Abdominal Neoplasms'.
We keep the first occurrence of a string in the mrconso file (a join of MRCON and MRSO) that is indistinguishable from other strings within a concept. Since preferred forms occur in this file before non-preferred forms, this has the effect of keeping the preferred forms.
Yes, we find the mnemonics much easier to understand when we manually review output.
NLSStrings is a descendent of a corresponding Prolog module used by MetaMap and similar programs developed in the Natural Language Systems program. It predates lvg, is tailored to our specific needs, and is more efficient than lvg (the last time I checked). I'm also fairly certain that lvg doesn't have the full functionality required for the specific kind of normalization we're doing; in particular, it doesn't respect word order and the kinds of normalization involving NOS and NEC are different (MetaMap continues to filter out NOS but no longer does so with NEC).
For information on what filtering is done in the MMTx and MetaMap
programs, please review the following document which describes
what filtering was done for 2001.
Filtering the UMLS Metathesaurus for MetaMap, 2001
You can also find more information
on Filtering the UMLS Metathesaurus and Ambiguity in the UMLS
Metathesaurus from the
SKR Reference
Information page.
The data files themselves did not change so you do not have to regenerate. If you want your processing based on the 2002 UMLS then you will want to create new source files (sourceData/<your custom>/umls) from the 2002 UMLS data files and run the Data File Builder. To migrate you data to V.2.2 do the following: