Usage: metamap09 [Options] [Input File] [Output File]
MetaMap maps (matches) text (from documents, queries) into concepts from
the UMLS Metathesaurus. Text is taken through a series of modules and
broken down into the components that include sentences, phrases,
lexical elements and tokens. Variants are generated from the resulting
phrases, and candidate concepts from the UMLS Metathesaurus are
retrieved and evaluated against their phrases. The resulting concepts
are organized in such a way as to best cover the text, known as a final
mapping.
MetaMap Options:
MetaMap is highly configurable, and its behavior is controlled by option
flags, each of which has a short name (e.g.,
-I) and a long name
(e.g.,
--show_cuis).
File Options: When MetaMap is run on the command line,
the default input and output are standard input and output.
MetaMap allows specifying input and output files on the command line,
but the order in which they are specified is important:
% metamap09 [Options] InputFile OutputFile
The InputFile and OutputFile arguments, if specified,
must be the last two arguments. It is not necessary to specify OutputFile,
because the output file will default to <InputFile>.out.
Note that if the output file (whether specified on the command line or not)
is an existing file,
the existing file will be overwritten and its original contents lost.
Note For MMTX Users: Please note the difference with
MetaMap where there are no option flags for specifying the input and
output file names! The input and output file names are specified directly
on the command line without options with Metamap and if no filename is
specified, MetaMap assumes standard in and out.
Data Options: Data options determine the underlying vocabularies
and data model used by MetaMap.
-A (--strict_model)
[default]
-C (--relaxed_model)
- determines which data model is used. If more than one model
is specified, the strictest one is used; if none is specified, then
the strict model is used.
See the report Filtering the UMLS
Metathesaurus for MetaMap at the SKR website
here
(under "Technical Documents") for a description of the models.
-V (--mm_data_version) <data version>
- specifies which version of MetaMap's data files will be used for
processing. For Example,
2004
specifies ones of the UMLS
2004 models, 2004_level0
specifies one of the level 0
UMLS 2004 models. NOTE: because "normal" processing is the default,
this option should very rarely be used.
The default data version is:
normal
: All vocabularies
in a given AA release of the Metathesaurus with the exception of
the AMA vocabularies, CPT (Current Procedural Terminology) and CDT
(Current Dental Terminology). Also excluded from the normal
data version are CPT and CDT derivative vocabularies such as HCPCS
(Healthcare Common Procedure Coding System) and MTHHH (Metathesaurus
HCPSCS Hierarchical Terms).
Other data versions that are sometimes available are:
level0
: UMLS
vocabularies with the least restrictive source restriction
level, namely level0. Even level 0 vocabularies have some
copyright restrictions, but they are less restrictive than
those with restriction level 1 through 3; and
level0and4
: Level 0
and level 4 vocabularies. Currently SNOMEDCT and its derivatives
are the only level 4 vocabularies in the Metathesaurus. (Note
that, despite the numbering, level 4 is not as restrictive as
levels 1 through 3, especially for USA users.)
Processing Options: Processing options control MetaMap's internal
behavior.
-@ (--WSD OPTION)
- specifies the hostname running the WSD Server
to be used for word-sense disambiguation
-+ (--bracketed_output)
- surrounds the Phrase, Candidates, and Mappings section of
output with ">>>>>" and "<<<<<" brackets. E.g.,
>>>>> Phrase
heart attack
<<<<< Phrase
and similarly for Candidates and Mappings.
-8 (--dynamic_variant_generation)
- forces MetaMap to generate variants dynamically rather than by
looking up variants in a table. This option is normally used only for
debugging purposes.
-a (--all_acros_abbrs)
- allows the use of any acronym/abbreviation variants, which are
the least reliable form of variation, because normally at most one
of the expansions for an abbreviated form is correct.
-d (--no_derivational_variants)
- prevents the use of any derivational variation in the computation
of word variants. This option exists because derivational variants, as
opposed to all other forms of variation, always involve a significant
change in meaning.
-D (--all_derivational_variants)
- forces the use of all derivational variation,
instead of only those between adjectives and nouns.
Adjective/noun derivational variants are generally the best
derivational variants.
-g (--allow_concept_gaps)
- causes MetaMap to retrieve Metathesaurus candidates with gaps (such
as "Unspecified childhood psychosis" for "unspecified psychosis"). This
option does not appreciably affect MetaMap's performance. It
is appropriate for browsing purposes.
-i (--ignore_word_order)
- allows MetaMap to ignore the order of words in the phrases it
processes. MetaMap was originally developed to process full text and
consequently depended very strongly on normal English word order.
This option avoids the use of specialized word indexes used
for efficient candidate retrieval, it ignores word order when matching
phrase text to candidate words, and it replaces the normal coverage
metric with an involvement metric for evaluating how well a candidate
covers the words of a phrase.
-K (--ignore_stop_phrases)
- simply prevents MetaMap from aborting its processing for commonly
occurring phrases that are known to produce no mappings. This option
is useful only for generating a new table of stop phrases after a
change in UMLS data.
-l (--allow_large_n)
- enables retrieval of Metathesaurus candidates for two-character
words occurring in more than 2,000 Metathesaurus strings
and one-character words occurring in more than 1,000 Metathesaurus strings.
This option also allows retrieval for words that can be a
preposition, conjunction or determiner.
-L (--longest_lexicon_match)
- causes lexical lookup to prefer matching as much text as possible
to lexicon entries. This used to be the only form of lexical lookup,
but it has been superseded by a shortest-match algorithm, this is
because the SPECIALIST lexicon is a syntactic
lexicon; multi-word items contain no more information than their
constituents which have their own lexicon entries.
-o (--allow_overmatches)
- causes MetaMap to retrieve Metathesaurus candidates which have words
on one or both ends that do not match the text. For example,
overmatches of "medicine" include 'Alternative Medicine', 'Medical
Records' and 'Nuclear medicine procedure, NOS'. The use
of --allow_overmatches greatly increases the number of candidates
retrieved and is consequently much slower than MetaMap without
overmatches. It is appropriate for browsing purposes.
-P (--composite_phrases)
- causes MetaMap to construct longer, composite phrases from the
simple phrases produced by the parser. A composite phrase is a simple
phrase followed by any prepositional phrase optionally followed by one
or more of prepositional phrases. An example is "pain on the left side
of the chest" which will map to 'Left sided chest pain' rather than
separate concepts as it would without the option. Note that
--composite_phrases
is experimental; it is currently both
inefficient and not completely correct.
-Q (--quick_composite_phrases)
- is a version of
--composite_phrases
designed to
overcome its inefficiency. It is both experimental and temporary.
-S (--tagger OPTION)
- specifies the hostname running the Tagger
Server to be used for tagging
-t (--no_tagging)
- causes the tagger to not be called.
By default, the SPECIALIST parser will use the results of a
tagger to assist in parsing.We previously used the Xerox PARC part of
speech tagger but now use the Med-Post/SKR tagger. The MedPost tagger
was developed at NCBI specifically for tagging biomedical text; we
modified it to use our part of speech tags. NOTE: specifying this option
will result in the tagger not being called.
-u (--unique_acros_abbrs_only)
- restricts the generation of acronym/abbreviation variants to those
forms with unique expansions. This option produces better results
than allowing all forms of acronym/abbreviation variants, but it is
still better to prevent all such variants.
-U (--allow_duplicate_concept_names)
- requires that two Concepts' CUIs match (in addition
to the Metathesaurus Concept itself and its position in
the current phrase) in order for an evaluation to be
considered redundant.
-y (--word_sense_disambiguation)
- causes MetaMap to attempt to disambiguate among concepts scoring
equally well in matching input text. The initial implementation of
MetaMap Word Sense Disambiguation uses a single method that chooses a
concept (or concepts) having the most likely semantic type for the
context in which the ambiguity arises.
-Y (--prefer_multiple_concepts)
- causes MetaMap to score mappings with more concepts higher than those
with fewer concepts. (It does so simply by inverting the normal
cohesiveness value.) As a simplified example, with this option in
effect, the input text "lung cancer" will score the mapping to the two
concepts 'Lung' and 'Cancer' higher than the mapping to the single
concept 'Lung Cancer'. This option is useful for discovering
higher-order relationships among concepts found in text (e.g., that
'Lung' is the location of 'Cancer' in the example).
-z (--term_processing)
- tells MetaMap to process terms rather than full text. When invoked,
MetaMap treats each input as a single phrase (although the parser is
still used to determine the head of that phrase). It also causes MetaMap
to use the involvement metric rather than coverage for evaluating
Metathesaurus candidates When used in
conjunction with the --allow_overmatches and --allow_concept_gaps
options, it constitutes MetaMap's browse mode for thorough searching of
the Metathesaurus. In this case it is wise to also specify -m
(--mappings) to toggle mapping construction off; otherwise, MetaMap
spends too much time trying to combine the many candidates into final
mappings.
Output Options: Output options control how MetaMap displays
results.
-b (--compute_all_mappings)
- forces MetaMap to compute and display all mappings,
rather than only the top scoring ones.
Note: It is almost never useful to display all mappings because of their large number.
-c (--hide_candidates)
- disables the displaying of the the list of Metathesaurus candidates.
By default, candidates are displayed best to worst,
according to the MetaMap evaluation metric.
Note that (assuming this option is not selected)
if a candidate is not the preferred name for a concept,
the preferred name is displayed in parentheses
immediately following the candidate. Displaying both the matching
string and the preferred concept name when they differ is intended to
avoid any confusion about why a concept appears on the candidate list.
It is generally useful to display both the candidate list and the final mappings.
-e (--exclude_sources) <list>
- excludes those sources in the comma-separated <list> where
spaces are not allowed.
-E (--indicate_citation_end)
- causes an end-of-transmission term to be written when
processing of each unit of input is complete. It is only useful for
processing using the Scheduler and only then with validated generic
processing.
-G (--sources)
- displays the Metathesaurus sources for each candidate and mapping
in the output.
-I (--show_cuis)
- shows the UMLS CUI for each concept displayed.
-j (--dump_aas)
- displays the Acronyms and Abbreviations discovered by MetaMap
in the following form:
AA|PMID|Acronym|Expansion|#Acronym Tokens|#Acronym
Chars|#ExpansionTokens|#Expansion Chars
-J (--restrict_to_sts) <list>
- restricts output to those concepts with semantic types in the
comma-separated <list> where spaces are not allowed.
-k (--exclude_sts) <list>
- excludes concepts having a semantic type in the
comma-separated <list> where spaces are not allowed.
-m (--hide_mappings)
- disables the display of mappings.
As noted above,
it is generally useful to display both the candidate list and the final mappings.
-M (--mmi_output)
- displays in a separate section, the concepts from the
highest-scoring mappings and their Semantic Types
-n (--number_the_candidates)
- simply numbers the candidates in a displayed candidate list.
-N --fielded_mmi_output
- displays in a separate section, a ranked list of all the
mappings assigned to the text. Additional data such as the PMID
of the citation, CUIs, abbreviated Semantic Types are also
included.
-O (--show_preferred_names_only)
- prevents MetaMap from displaying both the matching string as well
as the preferred name when it displays concepts.
-p (--hide_plain_syntax)
- disables the display of the words forming each phrase,
as determined by the SPECIALIST parser.
-q (--machine_output)
- causes output to take the form of Prolog
terms rather than human-readable form. The --machine_output option
affects all other output options. For further information on machine
output, including visually enhanced examples,
see the
SKR Help page.
-r (--threshold) <integer>
- restricts output to candidates whose evaluation score equals or
exceeds the specified threshold. Judicious use of this option can prevent
MetaMap from making errors in situations where some input text has no
close matches in the Metathesaurus. An appropriate threshold can
usually be determined simply by examining MetaMap output for typical
text in a given application.
-R (--restrict_to_sources) <list>
- restricts output to those sources in the comma-separated
<list>; spaces are not allowed in the list.
-s (--hide_semantic_types)
- disables the display of the semantic types of Metathesaurus concepts.
By default, the semantic types of Metathesaurus concepts are
displayed in square brackets for each concept in the candidate list
and the mappings.
-T (--tagger_output)
- displays the output of the MedPost/SKR tagger lining up input
words on one line with their tags on a line below.
-v (--variants)
- displays the variants generated for each input word.
-W (--preferred_name_sources)
- lists all sources for the preferred names of displayed
concepts. Note that this is just one of many possible choices for
showing sources; showing all sources for any synonym in a concept, for
example, would often produce very cluttered output.
-x (--syntax)
- controls the output form of the results of the SPECIALIST parser. It outputs
a Prolog term showing details of the syntactic processing.
-X (--truncate_candidates_mappings)
- first truncates the list of candidates to the 100 top-scoring
ones before computing mappings and then truncates the list of mappings
to the 8 top-scoring ones. This option can sometimes prevent a
combinatorial explosion caused by computing a large number of mappings
from a large number of candidates as is often encountered when
using --allow_overmatches.
-% (--XML) <option>
- generate XML output. options are
format
, noformat
, format1
,
and noformat1
. The options
format
and format1
provide
formatted, pretty-printed XML output,
while noformat
and noformat1
provide
a concise, ununformatted XML output. See ''MetaMap 2009
Release Notes''
(http://metamap.nlm.nih.gov/MM09__Release__Notes.shtml) for
more information.
--negex
- outputs a list of negated umls concepts occurring in the input and
the associated strings that caused the negation.
--no_header_info
- suppresses printing of informational messages at the beginning of
a MetaMap session.
--phrases_only
- (for debugging purposes only)
--warnings
- (for debugging purposes only)
Miscellaneous Options:
--help
- displays MetaMap usage, i.e., the form of the command and a list
of all options. This option has no short form, and must therefore
be invoked as --help.