SKR Help Information

Home	NLM » LHNCBC » SKR » Help

Quick Guide:

Web Page Info
File Formats and Samples
Notes
Explanation of MetaMap Machine Output Format (prior to MetaMap 2008) (HTML: 7 kb)
Explanation of 2008 MetaMap Machine Output Format (HTML: 11 kb)
Explanation of 2008 MetaMap XML Output Format (HTML: 44 kb)
Explanation of MetaMap Fielded Output Format (prior to MetaMap 2008) (7.8 kb)

From the SKR/MetaMap homepage you can do the following:

Interactive-Mode: Brings up another web page with the various batch-mode related options for you. When you select this button from the homepage, you will be prompted to enter your username and password combination.

*** PLEASE NOTE: THE INTERACTIVE MODE IS DESIGNED TO ONLY HANDLE SMALL AMOUNTS OF TEST DATA WHILE YOU EXPLORE THE VARIOUS OPTIONS! IF YOU HAVE A LARGE FILE TO PROCESS - PLEASE USE THE BATCH MODE DESCRIBED BELOW.
- Interactive MetaMap
- Interactive Medical Text Indexer (MTI)
- Interactive SemRep
  
  All bring up a form allowing you to run the specified program interactively. This will allow you to specify all of the various program specific options and specify text to be processed (either via cut-n-paste or file upload). The results will then be sent back to the web page after processing.
Batch-Mode: Brings up another web page with the various batch-mode related options for you. When you select this button from the homepage, you will be prompted to enter your username and password combination.
- Submit MetaMap Batch Job
- Submit Medical Text Indexer (MTI) Batch Job
- Submit SemRep Batch Job
- Submit Generic Batch Job
- Submit Generic Batch Job with Validation
  
  All bring up a form allowing you to run the specified program in batch-mode. Here you only have the option of specifying a file to be uploaded for processing (no cut-n-paste). You are allowed to specify any of the program specific options and whether you would like a compressed downloadable file created with all of your results. Once the job has finished you will be notified via email that the job has completed, how many errors (if any) were encountered, how many items were processed, etc. The email will also contain information on how you can access the results of your run. You can also monitor the progress of your job via the Batch Job Status Window described below.
- Batch Job Status Window: Opens up the Batch-Mode scheduler status window show all of the jobs currently running and their run status. The window will be placed in the top left corner of your monitor and should automatically update itself every 5 seconds.
- Suspend Submitted Job: Allows you to stop execution of any job currently running that you personally own. The current state of the job is saved for you upon suspension of execution.
- Resume Submitted Job: This option allows you to restart execution of a Suspended Job exactly where it was when it was suspended.
- Rerun Submitted Job: This option allows you to rerun a job that either was Suspended or completed normally. This option will will start-over all of the processing for a specific job.
Batch Job Status Window: Opens up the Scheduler's Batch-Mode Status Window to show all of the currently running jobs and their run status. The window will be placed in the top left corner of your monitor and should automatically update itself every 5 seconds.
Research Information: Research papers and presentations describing Metamap, SKR, Edgar, Arbiter, Semantic Representation, etc. Almost all of the papers are in PDF form and the presentations are HTML viewable. This is a very good spot to start familiarizing yourself.
Schedule Your Workstation: Specific to users here who are allowing us to use their workstations in the scheduling pool.

SUPPORTED FILE FORMATS:

The SKR/MetaMap system requires as input: An ASCII file, and the file must be formatted in one of the formats listed below. For the best results, we recommend the first format "MEDLINE". The MEDLINE format is what the SKR/MetaMap program was initially built around and is still the best supported of all the formats. It should also be noted that it is always better to lump more items into a single file and submit that to the Scheduler and let it do the distribution for you. Instead, if you submit a larger number of smaller files with fewer entries, it forces the Scheduler to swap more and slows things down.

Note: Please also note that the Scheduler does not support non-ASCII characters. If your file has Unicode or UTF-8 character set characters, it will likely cause an error.

Note: If you are going to send free format text, please break your text into smaller chunks to run through the Scheduler. Large chunks of text take too long to process via the Scheduler. As a rule of thumb, we typically break free form text into chunks of around 2,000 - 3,000 characters.

MEDLINE format with a blank line separating each item to be processed.
Use of either "PMID-" or "UI -" as an identifier tag is supported by all applications.

Format	Sample
Columns: 12345678901234567890 UI - ######### TI - Some Title Title line 2 & subsequent lines (if necessary). AB - Abstract of item Abstract line 2 & subsequent lines (if necessary). Alternatively, UI - ######### TI - Some Title all one string. AB - Abstract of item all one string extending over multiple lines when necessary and as long as you need it too be. This is sometimes easier because you don't have to reformat you input as much.	MEDLINE Sample File

Format

Sample

Columns:  12345678901234567890
          UI  - #########
          TI  - Some Title
                Title line 2 & subsequent lines (if necessary).
          AB  - Abstract of item
                Abstract line 2 & subsequent lines (if necessary).

          Alternatively,

          UI  - #########
          TI  - Some Title all one string.
          AB  - Abstract of item all one string extending over multiple lines 
when necessary and as long as you need it too be.  This is sometimes easier 
because you don't have to reformat you input as much.

MEDLINE Sample File

Free format with a blank line separating each item to be processed.

Format	Sample
item 1 text to be processed free text item 1 line 2 of free text to be processed item 2 first line to be processed.	Free Text Sample File

Single Line Delimited Input
NOTE: You MUST select "Single Line Delimited Input" from the list of "Scheduler Specific Options" on the various submission pages for this to work.

Format	Sample
item 1 text to be processed free text item 2 text to be processed.	Single Line Delimited Input Sample File

Single Line Delimited Input w/ ID
NOTE: You MUST select "Single Line Delimited Input w/ ID" from the list of "Scheduler Specific Options" on the various submission pages for this to work. This option assumes a two field input line: "ID|text to be processed". The ID can be a combination of any alpha-numeric characters and the underscore character ("_"). For example, "001_title" or "00001".

Format	Sample
0000001\|item 1 text to be processed free text with ID 0000002\|item 2 text to be processed.	Single Line Delimited w/ ID Input Sample File

SGML tagged using the following tags ( Not supported by the MTI program ):

Format	Sample
<DOC> <DOCID>######### </DOCID> <TITLE>Some Title </TITLE> <ABSTRACT>The abstract of a citation </ABSTRACT> </DOC> <DOC> <DOCID>######### </DOCID> <TITLE>Some Title </TITLE> <ABSTRACT>The abstract of a citation </ABSTRACT> </DOC>	SGML Sample File

Format

Sample

<DOC>
  <DOCID>#########
  </DOCID>
  <TITLE>Some Title
  </TITLE>
  <ABSTRACT>The abstract of a citation
  </ABSTRACT>
</DOC>
<DOC>
  <DOCID>#########
  </DOCID>
  <TITLE>Some Title
  </TITLE>
  <ABSTRACT>The abstract of a citation
  </ABSTRACT>
</DOC>

SGML Sample File

NOTE: The SGML tags <ABSTRACT> and <TEXT> are synonymous inside of SKR.

NOTES:

You are only allowed to submit batch jobs as "Normal" priority.
We are currently supporting 2006, 2007, and 0708 UMLS Knowledge Sources. The usage of any year is selectable in both interactive and batch mode via the "Knowledge Source Options" pull-down menu.
If you see one of your jobs developing a large number of errors, please go ahead and suspend the job and try to figure out what went wrong offline. This will free up the scheduling queue for other jobs to be ran.
The tagger/parser only supports ASCII files with blank lines separating the phrases to be parsed.
None of the current programs available within the Scheduler support UTF-8! The input files must be converted to ASCII before submission.

Last Modified: October 02, 2008

ii-public

Links to Our Sites

MetaMap Public Release

NEW: Distributable version of the actual MetaMap program.

Indexing Initiative (II)

Investigating computer-assisted and fully automatic methodologies for indexing biomedical text. Includes the NLM Medical Text Indexer (MTI).

Semantic Knowledge Representation (SKR)

Develop programs to provide usable semantic representation of biomedical text. Includes the MetaMap and SemRep programs.

MetaMap Transfer (MMTx)

Java-Based distributable version of the MetaMap program.

Word Sense Disambiguation (WSD)

Test collection of manually curated MetaMap ambiguity resolution in support of word sense disambiguation research.

Medline Baseline Repository (MBR)

Static MEDLINE Baselines for use in research involving biomedical citations. Allows for query searches and test collection creation.

Lister Hill National Center for Biomedical Communications

U.S. National Library of Medicine

National Institutes of Health

Department of Health and Human Services