HSR&D Study
Newly Funded | Current | Completed | DRA | DRE | Portfolios/Projects | Centers | QUERI | Career Development Projects
HIR 09-002
|
Consortium for Healthcare Informatics Research: Clinical Inference and Modeling
Stephen Lee Luther PhD MA James A. Haley Veterans' Hospital, Tampa, FL Tampa, FL Funding Period: April 2009 - September 2013 |
|
BACKGROUND/RATIONALE:
The goal of the CHIR Clinical Inference and Modeling Project is to use machine learning (ML) techniques to augment natural language processing (NLP) and the interpretation of outputs from NLP programs. NLP and ML approaches are data intensive, requiring large numbers of cases and carefully labeled human annotated data. OBJECTIVE(S): The Objectives of this project are: 1. Improve extraction of clinically relevant information using machine learning approaches. 2. Use machine learning and epidemiologic methods to infer temporal relationships from unstructured and structured data. 3. Construct classification and predictive models from novel feature sets. METHODS: In Aim 1, the potential of active learning to improve the CHIR annotation processes will be explored. Machine learning techniques, such as active learning, address the issue of selecting the most informative instances to give to the expert to label. Also, we are studying how target variable quality affects statistical text mining (STM) by taking annotated data and intentionally re-introducing miscoding in incremental steps. In the final component of Aim 1 we will design and test open source programs that will facilitate the evaluation of feature set quality on ML approaches. Aim 2 combines traditional epidemiological methods with ML to better understand temporal relationships. In Aim 3, a variety of ML algorithms will be employed to construct heterogeneous models by combining existing structured data with information extracted from clinical notes and other sources of unstructured data. FINDINGS/RESULTS: Analyses are ongoing. -Part of developing heterogeneous datasets will be extracting information from both unstructured and semi-structured data sources, NLP systems are designed to primarily work with unstructured data. CIM project staff are working with the IEM team to develop ML-based strategies to improve NLP extraction of semi-structured data. The initial algorithms will be available by August 2012. -Wrote a new module that goes into GATE and interfaces with TagLine (tool designed to extract information from semi-structured data) which will also be able to interface with UIMA. This tool is currently being tested on tables and slot-filler pairs. -Developing a structured vocabulary for identifying sections in progress notes using ML techniques. IMPACT: Results of this project will allow for more effective construction of training sets, this is especially important when large scale annotation is needed. It will enhance our ability to extract and use temporal relations from processed text, which is crucial to clinical decision support and to construct clinically useful predictive models to enhance patient care. PUBLICATIONS: Journal Articles
DRA:
Health Systems, Infectious Diseases
DRE: Research Infrastructure, Diagnosis Keywords: Data Management, Healthcare Algorithms, Information Management, Knowledge Integration, Natural Language Processing, Surveillance MeSH Terms: none |