These pages use javascript to create fly outs and drop down navigation elements.

HSR&D Study


Sort by:   Current | Completed | DRA | DRE | Keywords | Portfolios/Projects | Centers | QUERI

SHP 08-179
 
 
New NLP Tools for Extraction of Values from Microbiology Text
Michael E Matheny MD MS MPH
Center for Patient Healthcare Behavior
Nashville, TN
Funding Period: May 2008 - September 2008

BACKGROUND/RATIONALE:
Extracting interpretable data from free text records that are semi-structured has been an ongoing issue in health services research and medical informatics for many years. Such records are commonly created using templates that allow users to answer standard questions (such as pathology reports) and/or format medical information into particular patterns (such as microbiology reports). These records lack regular sentence structure in at least a portion of the report and can include many values (numbers or categorical text) associated with medical information either phrased as a question or as a phrase.

The informatics solution to this type of data has historically been by expression matching, where simple or complicated combinations of phrases, questions, or even report format patterns, such as 10 equal signs in a row, are used to locate and extract values of interest. The advantage to these types of systems is that almost any medical information that follows a precise character pattern can be identified and extracted with enough programming.

However, these systems cannot be generalized and stop functioning very easily. This is because changing as much as one character or word in the report format can result in expression matching failure. Since significant heterogeneity exists in the same type of reports between hospitals, clinics, and health systems, such programs are not transferable. A more common problem is periodic slight changes of a report format over time within a single institution, each of which requires algorithm recoding. In addition, using this type of system on data from multiple concurrent sources requires distinct code to handle each separate format version.

Natural language processing is another type of text extraction method that analyzes sentence structure and syntax to extract meaning from words and phrases and can relate terms to other terms in a sentence or paragraph using grammar rules. The power of these systems has improved greatly over the years, and medical NLP systems have begun to map terms to medical ontologies, which incorporate large bodies of medical knowledge into relationships which are computer interpretable. For example, in Snomed-CT, the term amoxicillin is associated with the broader concept of the penicillin class of antibiotics which is associated with the broader concept of antibacterial drugs. Thus, by simply identifying and mapping amoxicillin to an ontology, a number of additional facts are known regarding that term. However, use of NLP for semi-structured text is difficult because sentence syntax and grammar do not exist to code relationships, and values and categorical data outside of a sentence structure are often ignored or incorrectly tagged.

Microbiology reports are an important example of medical information that is typically reported in a semi-structured format. On the continuum of free text reports from complete sentence-based narrative to exclusively phrase and value based forms, these reports are primarily of the phrase and value type, which accounts for most prior extraction methods using expression matching methods. However, significant variation exists in the format of these reports.
These reports contain critical patient information which informs public health officials regarding antibiotic resistance as well as providing definitive diagnoses for sepsis, urinary tract infections, and other infections. Blood cultures, a portion of microbiology reports, are utilized heavily. Various studies have shown that adjustment of antibiotic treatment should be performed in 75% of cases after positive blood cultures are reported, but performance gaps exist with a discordance between antimicrobial susceptibility testing and antibiotic therapy in 20-25% of patients. Physician documentation of positive culture results is lacking in 25% of patients, and is more common among surgical services.2 Studies have reported that the contamination rate is 2-3% of all blood cultures and 30-40% of positive blood cultures.

However, rapid notification systems conducted either manually or electronically have improved both physician compliance rates and reduced the time for appropriate antibiotic switching. In addition, antibiotic changes on the basis of blood culture results have been shown to reduce the usage and cost of antibiotic treatments.

We selected microbiology reports as the clinical content for the pilot evaluation of semi-structured free text parsing because of the challenges this type of report will present to an NLP system and the high yield of medical information present in the reports. In addition, filtering out contaminated blood cultures is critical prior to using the results in any automated fashion because of the high baseline false positive rate. We chose to pursue NLP system development because of the robust handling of medical information and relationships that can be maintained by this method. In addition, expression matching limitations are addressed since report format variation is easily accommodated. Once the NLP system is validated for extracting this type of data, other types of free text documentation can be evaluated, and further research efforts can be directed towards appropriate antibiotic use among patients with positive blood cultures.

OBJECTIVE(S):
Objective 1: Develop and validate a natural language processing solution for extracting values from semi-structured microbiology reports

A concept-based natural language processing (NLP) system will be adapted to parse values from blood culture microbiology reports. After the system parses a random sample of microbiology reports from among each of the hospitals in VISN-9, manual review will be conducted to evaluate whether the system was able to correctly pair antibiotic and bacteria information with minimum inhibitory concentrations and sensitivity interpretations.

Objective 2: Develop and validate a rule algorithm in order to determine whether a positive blood culture result should be considered contaminated.

A clinical rule algorithm to detect blood culture contamination will be developed within the NLP system environment in compliance with guidelines and clinical expert opinion. The positive culture data used in Aim 1 will be used in Aim 2. Two independent clinicians will review the microbiology susceptibility data and make the determination of whether the sample was contaminated. The processed data in Aim 1 will then evaluated with the rule algorithm to determine automated accuracy for detecting culture contamination.

METHODS:
Enter text here.

FINDINGS/RESULTS:
No results at this time.

IMPACT:
The Veterans Health Administration is one of the most comprehensive electronic medical records in the United States, integrating inpatient and outpatient medical care, providing outpatient pharmacy services, and supporting a number of informatics initiatives such as clinical reminders and bar coded medication administration records. However, large volumes if information cannot be analyzed because it is captured in narrative free text or as values in semi-structured free text. Developing and adapting automated text processing methods to allow research and quality improvement initiatives to utilize this data could have a tremendous impact on clinical care and improve the quality and quantity of research efforts.

With regards to this particular clinical domain, appropriate antibiotic coverage for patients with bacterial infections is important for quality medical care, and reduces medical morbidity, antibiotic utilization, and antibiotic costs. Because of the quality of medication administration records and patient allergies, the VA is in an ideal position to institute clinical decision support for appropriate antibiotic switching on the basis of microbiology results as well as provide automated population-level antibiotic resistance surveillance beyond the subset that is currently manually reviewed and reported.

In addition, the VA is one of the most standardized EHR systems in the country, and one would expect microbiology reports to be identical across the system. However, manual review of a sample of the reports among the six VISN-9 institutions revealed five report variations. Some of these were small character variations that would not affect human readability but that would compromise expression matching. Some reported the standard two blood cultures drawn at the same time in one report, and others split the results into two separate reports. This emphasizes the need to avoid reliance upon expression matching methods within the VA.

PUBLICATIONS:
None at this time.


DRA: none
DRE: none
Keywords: none
MeSH Terms: none