Skip Navigation
Lister Hill Center Logo  

Search Tips
About the Lister Hill Center
Innovative Research
Publications and Lectures
Training and Employment
LHNCBC: Document Abstract
Year: 2002Adobe Acrobat Reader
Download Free Adobe Acrobat Reader
LHNCBC-2002-027
Unsupervised, Corpus-Based Method for Extending a Biomedical Terminology
Bodenreider O, Rindflesch TC, Burgun A
Proc. of the ACL Workshop; Natural Language Processing in the Biomedical Domain. 2002;:53-60.
Objectives: To automatically extend downwards an existing biomedical terminology using a corpus and both lexical and terminological knowledge. Methods: Adjectival modifiers are removed from terms extracted from the corpus (three million noun phrases extracted from MEDLINE), and demodified terms are searched for in the terminology (UMLS Metathesaurus, restricted to disorders and procedures). A phrase from MEDLINE becomes a candidate term in the Metathesaurus if the following two requirements are met: 1) a demodified term created from this phrase is found in the terminology and 2) the modifiers removed to create the demodified term also modify existing terms from the terminology, for a given semantic category. A manual review of a sample of candidate terms was performed. Results: Out of the 3 million simple phrases randomly extracted from MEDLINE, 125,000 new terms were identified for inclusion in the UMLS. 83% of the 1000 terms reviewed manually were associated with a relevant UMLS concept. Discussion: The limitations of this approach are discussed, as well as adaptation and generalization issues.
PDF