United States Department of Veterans Affairs

HSR&D Study


Newly Funded | Current | Completed | DRA | DRE | Portfolios/Projects | Centers | QUERI | Career Development Projects

HIR 09-002
 
 
Consortium for Healthcare Informatics Research: Clinical Inference and Modeling
Stephen Lee Luther PhD MA
James A. Haley Veterans' Hospital, Tampa, FL
Tampa, FL
Funding Period: April 2009 - September 2013

BACKGROUND/RATIONALE:
The goal of the CHIR Clinical Inference and Modeling Project is to use machine learning (ML) techniques to augment natural language processing (NLP) and the interpretation of outputs from NLP programs. NLP and ML approaches are data intensive, requiring large numbers of cases and carefully labeled human annotated data.

OBJECTIVE(S):
The Objectives of this project are:

1. Improve extraction of clinically relevant information using machine learning approaches.

2. Use machine learning and epidemiologic methods to infer temporal relationships from unstructured and structured data.

3. Construct classification and predictive models from novel feature sets.

METHODS:
In Aim 1, the potential of active learning to improve the CHIR annotation processes will be explored. Machine learning techniques, such as active learning, address the issue of selecting the most informative instances to give to the expert to label. Also, we are studying how target variable quality affects statistical text mining (STM) by taking annotated data and intentionally re-introducing miscoding in incremental steps. In the final component of Aim 1 we will design and test open source programs that will facilitate the evaluation of feature set quality on ML approaches. Aim 2 combines traditional epidemiological methods with ML to better understand temporal relationships. In Aim 3, a variety of ML algorithms will be employed to construct heterogeneous models by combining existing structured data with information extracted from clinical notes and other sources of unstructured data.

FINDINGS/RESULTS:
Analyses are ongoing.
-Part of developing heterogeneous datasets will be extracting information from both unstructured and semi-structured data sources, NLP systems are designed to primarily work with unstructured data. CIM project staff are working with the IEM team to develop ML-based strategies to improve NLP extraction of semi-structured data. The initial algorithms will be available by August 2012.
-Wrote a new module that goes into GATE and interfaces with TagLine (tool designed to extract information from semi-structured data) which will also be able to interface with UIMA. This tool is currently being tested on tables and slot-filler pairs.
-Developing a structured vocabulary for identifying sections in progress notes using ML techniques.

IMPACT:
Results of this project will allow for more effective construction of training sets, this is especially important when large scale annotation is needed. It will enhance our ability to extract and use temporal relations from processed text, which is crucial to clinical decision support and to construct clinically useful predictive models to enhance patient care.

PUBLICATIONS:

Journal Articles

  1. McCart JA, Berndt DJ, Jarman J, Finch DK, Luther SL. Finding falls in ambulatory care clinical documents using statistical text mining. Journal of the American Medical Informatics Association : JAMIA. 2012 Dec 15.
  2. Luther S, Berndt D, Finch D, Richardson M, Hickling E, Hickam D. Using statistical text mining to supplement the development of an ontology. Journal of Biomedical Informatics. 2011 Dec 1; 44 Suppl 1:S86-93.
  3. Garla V, Lo Re V, Dorey-Stein Z, Kidwai F, Scotch M, Womack J, Justice A, Brandt C. The Yale cTAKES extensions for document classification: architecture and application. Journal of the American Medical Informatics Association : JAMIA. 2011 Sep 1; 18:(5):614-20.
  4. Lo Re V, Lim JK, Goetz MB, Tate J, Bathulapalli H, Klein MB, Rimland D, Rodriguez-Barradas MC, Butt AA, Gibert CL, Brown ST, Kidwai F, Brandt C, Dorey-Stein Z, Reddy KR, Justice AC. Validity of diagnostic codes and liver-related laboratory abnormalities to identify hepatic decompensation events in the Veterans Aging Cohort Study. Pharmacoepidemiology and drug safety. 2011 Jul 1; 20:(7):689-99.
  5. Berndt DJ, McCart JA, Luther SL. Using ontology network structure in text mining. AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium. 2010 Nov 13; 2010:41-5.
  6. Konovalov S, Scotch M, Post L, Brandt C. Biomedical informatics techniques for processing and analyzing web blogs of military service members. Journal of Medical Internet Research [Electronic Resource]. 2010 Oct 5; 12(4):e45.
Conference Presentations

  1. Finch D, Luther SL. Extracting Semi-Structured Text Elements in Medical Progress Notes: A Machine Learning Approach. Poster session presented at: American Medical Informatics Association Annual Symposium; 2012 Nov 3; Chicago, IL.
  2. Jarman J, Luther SL, McCart J, Berndt DJ. Combining Natural Language Processing and Statistical Text Mining: Classifying Fall-Related Progress Notes. Presented at: VA HSR&D National Meeting; 2011 Feb 16; Washington, DC.
  3. McCart J, Jarman J, Finch D, Luther SL. An Introductory Look at Statistical Text Mining for Health Services Researchers. Presented at: VA HSR&D National Meeting; 2011 Feb 16; Washington, DC.
  4. Berndt DJ, Finch D, Foulis P, Luther SL. The Impact of Data and Target Quality in Text Mining Clinical Notes. Poster session presented at: American Medical Informatics Association Annual Symposium; 2010 Nov 13; Washington, DC.


DRA: Health Systems, Infectious Diseases
DRE: Research Infrastructure, Diagnosis
Keywords: Data Management, Healthcare Algorithms, Information Management, Knowledge Integration, Natural Language Processing, Surveillance
MeSH Terms: none