Version 2.5.2.0 CRISP Logo CRISP Homepage Help for CRISP Email Us

Abstract

Grant Number: 5R01LM007948-03
Project Title: Principled Methods for Very Large-Scale Causal Discovery
PI Information:NameEmailTitle
ALIFERIS, CONSTANTIN F. constantin.aliferis@vanderbilt.edu ASSISTANT PROFESSOR

Abstract: DESCRIPTION (provided by applicant): The long-term goal of the research proposed here is to develop, validate and apply methods for very large-scale principled causal discovery that scale up to massive datasets such as the ones found in bioinformatics, electronic patient records, and bibliographic systems. The explosive proliferation and growth (in sample, variables, and quality) of such datasets creates tremendous opportunities for biomedical discoveries, hence powerful methods for causal discovery have the potential to revolutionize biomedicine. To address this problem of scale, the co-PIs have developed several novel causal discovery algorithms with well-defined properties and guarantees that employ a principled local approach: these algorithms focus only on the local causal neighborhood (e.g. direct causes and effects or, alternatively, Markov Blanket) of a single or several "target" variable(s), and they are built on a formal framework for representing and learning causality. A plethora of preliminary experiments with simulated and real data suggest that the algorithms are sound and highly scalable. The local algorithms, by their assumptions, are expected to have applicability to a broad application context that includes bioinformatics, epidemiology, text analysis, and clinical medicine. The proposed research intends to take two focused steps in this broad application space. The local algorithms will be applied to (a) gene expression data from patients with lung cancer and (b) data from a large epidemiologic analysis of factors that influence development of breast cancer in patients with non-invasive breast disease. It is hypothesized that novel and potentially significant new causal relationships will be discovered. This hypothesis bears great biomedical and methodological significance. The specific aims are to (i) validate the novel causal algorithms; (ii) induce novel hypotheses about the immediate causes and effects of a selected group of genes implicated in lung cancer; (iii) induce novel causal hypotheses about the causes of breast cancer; (iv) compare the performance of the novel local algorithms to state-of-the-art alternatives; (v) disseminate new and powerful causal discovery tools. The methods to evaluate the novel causal algorithms and the hypotheses generated by them are: (a) validation against existing knowledge using structured, evidence-based, blinded literature review by domain experts; (b) selective experimentation in cell lines (lung cancer domain), and (c) statistical performance metrics.

Public Health Relevance:
This Public Health Relevance is not available.

Thesaurus Terms:
biomedical automation, computer data analysis, disease /disorder etiology, health science research, information system, method development
artificial intelligence, breast neoplasm, disease /disorder model, gene expression, lung neoplasm, mathematical model, medical record, neoplasm /cancer genetics
cell line, computer program /software, human data, microarray technology, polymerase chain reaction, western blotting

Institution: VANDERBILT UNIVERSITY
Medical Center
NASHVILLE, TN 372036869
Fiscal Year: 2005
Department: CANCER BIOLOGY
Project Start: 01-AUG-2003
Project End: 14-JUL-2008
ICD: NATIONAL LIBRARY OF MEDICINE
IRG: BLR


CRISP Homepage Help for CRISP Email Us