Version 2.5.2.0 CRISP Logo CRISP Homepage Help for CRISP Email Us

Abstract

Grant Number: 5R01LM006910-03
Project Title: DISCOVERING AND APPLYING KNOWLEDGE IN CLINICAL DATABASES
PI Information:NameEmailTitle
HRIPCSAK, GEORGE M. hripcsak@columbia.edu PROFESSOR

Abstract: A real-time clinical repository contains a wealth of detailed information useful for clinical care, research, and administration. In their raw form, however, the data are difficult to use there is too much volume, too much detail, missing values, and inaccuracies. Clinicians, researchers, and administrators require higher level interpretations that address their questions. For example, a clinician may need to know whether a patient is at sufficient risk for having active tuberculosis to warrant respiratory isolation. The answer to the question may be spread around the clinical repository in chest radiographs, laboratory tests, medication histories, vital signs, and physician's notes. Translating from these raw data to the interpretation (at risk or not) is a difficult and laborious task. The hypothesis of this proposal is that data mining techniques can be applied to a real-time clinical repository to discover knowledge and generate accurate clinical interpretations, and that these interpretations can be automated. The project differs from earlier machine learning studies in its emphasis on a real clinical repository and the use of natural language processing to supply coded clinical data. The specific aims are: (l) Select clinical domains--Several clinical domains with interesting, non-trivial clinical problems will be selected. Problems for which a gold standard answer can or has been assembled for a retrospective cohort will be chosen. (2) Prepare raw clinical data for mining--The raw data from a clinical repository will be transformed into a structure that facilitates data mining. The data will be flattened, pivoted, summarized, and mapped as needed for the domains. Narrative data will be coded using the MedLEE natural language processor. The preparation process will be automated. (3) Use data mining algorithms to discover knowledge- Several data mining algorithms will be applied to the selected clinical domains. Algorithms will include decision tree generation, rule discovery, neural networks, nearest neighbor, logistic regression, and composite algorithms (for variable reduction). The algorithms will be trained on a training set for each domain, and their predictive accuracy will be measured and compared to each other and to expert-written rules. The performance of human experts writing rules using manual data mining visualization techniques (which does not require an explicit training set) will also be measured. (4) Study the dependence of data mining on the training set--The performance of data mining algorithms depends on the data used the train them. The sensitivity of the algorithms to noise (inaccurate data), missing data, and training set size will be measured. (5) Use the discovered knowledge to generate real-time interpretations-- The output of the algorithms (decision tree, rules, neural network equation, or logistic regression equation, but not nearest neighbor) along with the necessary data preparation steps will be encoded in Arden Syntax Medical Logic Modules. They will be run against the clinical repository to verify that the interpretation can be automated in real time. (6) Disseminate the methods and results--The methods and results will be disseminated via publications and a Web site, and tools will be made available.

Public Health Relevance:
This Public Health Relevance is not available.

Thesaurus Terms:
artificial intelligence, clinical research, health care facility information system, information system analysis, vocabulary development for information system
Internet, computer assisted medical decision making, computer assisted patient care, computer program /software, computer system design /evaluation, data collection methodology /evaluation, information dissemination
human data

Institution: COLUMBIA UNIVERSITY HEALTH SCIENCES
Columbia University Medical Center
NEW YORK, NY 100323702
Fiscal Year: 2002
Department: CTR/MEDICAL INFORMATION SCI
Project Start: 01-APR-2000
Project End: 31-MAR-2003
ICD: NATIONAL LIBRARY OF MEDICINE
IRG: BLR


CRISP Homepage Help for CRISP Email Us