U.S. National Library of Medicine Logo MetaMapped Results Information

MetaMap is a highly configurable program that maps biomedical text to concepts in the UMLS® Metathesaurus®. MetaMap also forms the core of the Semantic Knowledge Representation (SKR) Project suite of programs, and is the basis for one of the indexing methods in the NLM Indexing Initiative's Medical Text Indexer (MTI). For more information on MetaMap and the SKR Project, please see the SKR Homepage and SKR Research Information web pages. For additional information on MTI and the NLM Indexing Initiative, please see the Indexing Initiative Homepage.

Between August 15, 2005 and January 17, 2006, we processed all 14,792,864 citations from the 2005 Medline®/PubMed® Baseline through the MetaMap program generating both Machine and Human Readable formatted results for each of the citations. We were able to process all but six (6) of the citations. These "bad" citations are being reviewed to see why we were not able to process them. The results from the remaining 14,792,858 citations are now available via the MBR Query Tool in either of the two formats.

Please note that the large size of these results requires care when choosing as an output format from the MBR Query Tool. The Human Readable results have a compressed size of 25GB, and the Machine Readable results are almost twice the size with 44GB in compressed form.

Details:

Data Used 2005 Medline/PubMed Baseline
Data Characteristics * Created between November 18 & 19, 2005
* Consists of 500 files of various counts
* Total of 14,792,864 citations, with all but 6 having results
Commands Used * Machine Readable: skr05 -ApcsmqtlbDa -E
* Human Readable: mm_print -I

Last Modified: May 09, 2007 ii-public
Links to Our Sites
MetaMap Public Release
NEW: Distributable version of the actual MetaMap program.
Indexing Initiative (II)
Investigating computer-assisted and fully automatic methodologies for indexing biomedical text. Includes the NLM Medical Text Indexer (MTI).
Semantic Knowledge Representation (SKR)
Develop programs to provide usable semantic representation of biomedical text. Includes the MetaMap and SemRep programs.
MetaMap Transfer (MMTx)
Java-Based distributable version of the MetaMap program.
Word Sense Disambiguation (WSD)
Test collection of manually curated MetaMap ambiguity resolution in support of word sense disambiguation research.
Medline Baseline Repository (MBR)
Static MEDLINE Baselines for use in research involving biomedical citations. Allows for query searches and test collection creation.
Lister Hill Center Homepage Link - Image of Lister Hill Center Lister Hill National Center for Biomedical Communications   NLM Homepage Link - NLM Logo U.S. National Library of Medicine   NIH Homepage Link - NIH Logo National Institutes of Health
DHHS Homepage Link - DHHS Logo Department of Health and Human Services
     Contact Us    |   Copyright    |   Privacy    |   Accessibility    |   Freedom of Information Act    |   USA.gov    Get Acrobat Reader button