SKR Help Info

 Home    NLM » LHNCBC » SKR » Help

Quick Guide:

From the SKR/MetaMap homepage you can do the following:


SUPPORTED FILE FORMATS:

The SKR/MetaMap system requires as input: An ASCII file, and the file must be formatted in one of the formats listed below. For the best results, we recommend the first format "MEDLINE". The MEDLINE format is what the SKR/MetaMap program was initially built around and is still the best supported of all the formats. It should also be noted that it is always better to lump more items into a single file and submit that to the Scheduler and let it do the distribution for you. Instead, if you submit a larger number of smaller files with fewer entries, it forces the Scheduler to swap more and slows things down.

Note: Please also note that the Scheduler does not support non-ASCII characters. If your file has Unicode or UTF-8 character set characters, it will likely cause an error.

Note: If you are going to send free format text, please break your text into smaller chunks to run through the Scheduler. Large chunks of text take too long to process via the Scheduler. As a rule of thumb, we typically break free form text into chunks of around 2,000 - 3,000 characters.

  1. MEDLINE format with a blank line separating each item to be processed.
    Use of either "PMID-" or "UI -" as an identifier tag is supported by all applications.

    Format Sample
    Columns:  12345678901234567890
              UI  - #########
              TI  - Some Title
                    Title line 2 & subsequent lines (if necessary).
              AB  - Abstract of item
                    Abstract line 2 & subsequent lines (if necessary).
    
              Alternatively,
    
              UI  - #########
              TI  - Some Title all one string.
              AB  - Abstract of item all one string extending over multiple lines 
    when necessary and as long as you need it too be.  This is sometimes easier 
    because you don't have to reformat you input as much.
    MEDLINE Sample File


  2. Free format with a blank line separating each item to be processed.

    Format Sample
    item 1 text to be processed free text
    item 1 line 2 of free text to be processed
    
    item 2 first line to be processed.
    Free Text Sample File


  3. Single Line Delimited Input
    NOTE: You MUST select "Single Line Delimited Input" from the list of "Scheduler Specific Options" on the various submission pages for this to work.

    Format Sample
    item 1 text to be processed free text
    item 2 text to be processed.
    Single Line Delimited Input Sample File


  4. Single Line Delimited Input w/ ID
    NOTE: You MUST select "Single Line Delimited Input w/ ID" from the list of "Scheduler Specific Options" on the various submission pages for this to work. This option assumes a two field input line: "ID|text to be processed". The ID can be a combination of any alpha-numeric characters and the underscore character ("_"). For example, "001_title" or "00001".

    Format Sample
    0000001|item 1 text to be processed free text with ID
    0000002|item 2 text to be processed.
    Single Line Delimited w/ ID Input Sample File


  5. SGML tagged using the following tags ( Not supported by the MTI program ):

    Format Sample
    <DOC>
      <DOCID>#########
      </DOCID>
      <TITLE>Some Title
      </TITLE>
      <ABSTRACT>The abstract of a citation
      </ABSTRACT>
    </DOC>
    <DOC>
      <DOCID>#########
      </DOCID>
      <TITLE>Some Title
      </TITLE>
      <ABSTRACT>The abstract of a citation
      </ABSTRACT>
    </DOC>
    SGML Sample File

    NOTE: The SGML tags <ABSTRACT> and <TEXT> are synonymous inside of SKR.


NOTES:
  1. You are only allowed to submit batch jobs as "Normal" priority.

  2. We are currently supporting 2006, 2007, and 0708 UMLS Knowledge Sources. The usage of any year is selectable in both interactive and batch mode via the "Knowledge Source Options" pull-down menu.

  3. If you see one of your jobs developing a large number of errors, please go ahead and suspend the job and try to figure out what went wrong offline. This will free up the scheduling queue for other jobs to be ran.

  4. The tagger/parser only supports ASCII files with blank lines separating the phrases to be parsed.

  5. None of the current programs available within the Scheduler support UTF-8! The input files must be converted to ASCII before submission.

Last Modified: October 02, 2008 ii-public
Links to Our Sites
MetaMap Public Release
NEW: Distributable version of the actual MetaMap program.
Indexing Initiative (II)
Investigating computer-assisted and fully automatic methodologies for indexing biomedical text. Includes the NLM Medical Text Indexer (MTI).
Semantic Knowledge Representation (SKR)
Develop programs to provide usable semantic representation of biomedical text. Includes the MetaMap and SemRep programs.
MetaMap Transfer (MMTx)
Java-Based distributable version of the MetaMap program.
Word Sense Disambiguation (WSD)
Test collection of manually curated MetaMap ambiguity resolution in support of word sense disambiguation research.
Medline Baseline Repository (MBR)
Static MEDLINE Baselines for use in research involving biomedical citations. Allows for query searches and test collection creation.
Picture of Lister Hill Center Lister Hill National Center for Biomedical Communications   NLM Logo U.S. National Library of Medicine   NIH Logo National Institutes of Health
DHHS Logo Department of Health and Human Services
     Contact Us    |   Copyright    |   Privacy    |   Accessibility    |   Freedom of Information Act    |   USA.gov