Version 2.5.2.0 CRISP Logo CRISP Homepage Help for CRISP Email Us

Abstract

Grant Number: 5R01LM006274-05
Project Title: UNLOCKING DATA FROM MEDICAL RECORDS WITH TEXT PROCESSING
PI Information:NameEmailTitle
WHITEHEAD, JENNIFER carol@boole.cs.qc.edu

Abstract: DESCRIPTION (adapted from the Abstract): The long-term aim of this project is to use natural language processing (NLP) to help realize the full potential of the Electronic Medical Record (EMR). Our research involves advanced NLP techniques to: 1) extract and encode information in textual reports; 2) map terms to an authoritative vocabulary; 3) obtain comprehensive domain coverage based on the processing of domain corpora; and 4) facilitate vocabulary development by providing visualization tools using the Extensible Markup Language (XML). It has already been demonstrated that MedLEE, the NLP system we developed, accurately extracts and codifies information in the EMR. This current project builds upon our experience with MedLEE and uses it to accomplish the latter three goals concerning vocabulary development and standardization. More specifically, MedLEE will be used to map source terms to UMLS concepts. MedLEE will process and structure the source terms and candidate UMLS concepts. Suitable matches will be found based on structural similarity between components of the source term and candidate concepts. This should enhance current methods because knowledge of the type of modifiers that match should improve the quality of the matches. We will also use MedLEE to process a large corpus and generate structured output in XML format. Statistics based on the structured output will be computed, and then clinically relevant composite terms will be detected based on frequencies of the structures containing the more elementary terms. Our method differs from other discovery methods because we use NLP techniques that identify semantic modifiers and complex relations even if the terms are distant from each other, whereas other methods use statistical co-occurrence data based on adjacency. The individual XML structures and statistics will be combined and mapped into a single XML tree. It will be possible to visualize the tree and frequencies using an XML tree viewer, to navigate the tree, to manipulate the tree, and to reorganize the tree according to different axes (i.e., procedure, body location, finding). The use of a sophisticated NLP system, such as MedLEE, is ideal as a foundation for our proposed work in vocabulary development and standardization; medical terminology is an integral part of medical language and a state of the art NLP system is especially equipped to handle the inherent complexities of language.

Public Health Relevance:
This Public Health Relevance is not available.

Thesaurus Terms:
automated medical record system, computer system design /evaluation
abstracting /text searching, information retrieval, method development, vocabulary development for information system
human data

Institution: QUEENS COLLEGE
65-30 KISSENA BLVD
FLUSHING, NY 113671597
Fiscal Year: 2002
Department: COMPUTER SCIENCE
Project Start: 01-JUL-1997
Project End: 31-DEC-2003
ICD: NATIONAL LIBRARY OF MEDICINE
IRG: BLR


CRISP Homepage Help for CRISP Email Us