Version 2.5.2.0 CRISP Logo CRISP Homepage Help for CRISP Email Us

Abstract

Grant Number: 1R01LM006759-01
Project Title: DATA MINING AND MODEL BUILDING IN MEDICAL INFORMATICS
PI Information:NameEmailTitle
BUCHANAN, BRUCE G. buchanan@cs.pitt.edu PROFESSOR

Abstract: Our long-term goal is to assist biomedical scientists by extracting and codifying new knowledge from large biomedical databases routinely by computer. As large collections of data become more readily accessibly, the opportunities for discovering new information increase. We propose here to work toward this goal by extending our prior research on machine learning in two important directions: (1) codification of disparate pieces of knowledge into a coherent model (model building), and (2) discovery of new information in medical databases (data mining). Machine learning programs find classification rules (or decision trees or networks) that separate members of a target class from other individuals. They have emphasized predictive accuracy, with some attention to tradeoffs between accuracy and cost of errors or between accuracy and simplicity. We propose a framework in which these, and other, tradeoffs are explicit and the criteria by which tradeoffs are made are available for modification. We also include semantic considerations among the criteria to control the internal coherence of models. "Data mining" is a recently-coined term for using computers to explore large databases, with a goal of discovering new relationships but usually with no specific target defined at the outset. In addition to accuracy, simplicity, coherence, and cost, a program that purports to discover new relationships must be able to assess novelty. We propose to measure the extent to which proposed relationships are novel by comparing them against existing knowledge in the domain of discourse, and to look for unusual rules (and other relations) that would be very interesting if true. The computer program we are primarily building on, RL, is a knowledge- based learning program that learns classification rules from a collection of data. RL has been demonstrated to be flexible enough to allow guidance from prior knowledge, and powerful enough to learn publishable information for scientists working in several different domains. Both parts of the research will requires extending the RL system in new ways detailed in the research plan, which are consistent with the overall design philosophy of the present system. We will primarily work with data already collected on pneumonia patients with with which we have considerable. We will test the generality of the criteria used to evaluate models and discoveries with a Baynesian Net learning. We will test the generality of the generality of the criteria used to evaluate models and discoveries with Bayesian Net learning system, K2.

Public Health Relevance:
This Public Health Relevance is not available.

Thesaurus Terms:
artificial intelligence, informatics, information retrieval
classification, computer assisted instruction, computer simulation, computer system design /evaluation, model design /development, pneumonia
human data

Institution: UNIVERSITY OF PITTSBURGH AT PITTSBURGH
350 THACKERAY HALL
PITTSBURGH, PA 15260
Fiscal Year: 1999
Department: COMPUTER SCIENCE
Project Start: 01-MAY-1999
Project End: 30-APR-2002
ICD: NATIONAL LIBRARY OF MEDICINE
IRG: BLR


CRISP Homepage Help for CRISP Email Us