National Cancer Institute   U.S. National Institutes of Healthwww.cancer.gov
Home Contact Us Search HMO Cancer Research Network, funded by the National Cancer Institute

Informatics and Research Tools

The HMOs affiliated with the CRN have an ethical and legal obligation to safeguard the confidentiality of medical information of their individual members. Thus, it is natural that CRN scientists and their home organizations have long been concerned about the sensitivity of health system data, especially medical information about individuals and also data related to quality or delivery of care and prices paid for medical care inputs. HMO leaders have legitimate concerns that without careful stewardship, such data could be compromised or misrepresented. Because of these concerns, and worries about breaches of patient or provider confidentiality, the CRN Steering Committee rejected the notion of establishing a centralized repository of generic data on the enrollees of each HMO for use in current and future studies. However, the CRN proposed developing standardized data resources to increase the quality and efficiency of research using automated data: the VDW, cancer counters, electronic medical records, and natural language processing. The CRN also operates under an NIH Certificate of Confidentiality as well as statutory provisions of the Agency for Healthcare Research and Quality that further shield CRN research information containing patient or provider identification from third party discovery.

The Virtual Data Warehouse (VDW)

The VDW is a distributed data warehouse, a federated database that is comprised of standardized datasets stored behind separate security firewalls at each participating CRN site. The data sets include variables with identical names, formats, and specifications (including definitions, labels, and coding). Person-level data at each CRN site remains under local control at that site. The VDW is supported by a set of informatics tools -- hardware and software -- facilitating storage, retrieval, processing, and managing VDW datasets; a set of access policies and procedures governing use of VDW resources; and documentation of all elements of the VDW.

Cancer Counters

To facilitate efficient study planning, CRN staff developed virtual data marts or "counters." The Cancer Counter includes summarized de-identified data that can produce counts of patients with cancer aggregated by tumor site, morphology, stage, health plan, vital status, race, gender, and Hispanic ethnicity, and allows users to select one- and two-way frequencies of these variables. The Cancer Counter has proven to be invaluable for estimating study population sizes for new cancer research proposals.

Electronic Medical Records (EMRs) and Natural Language Processing (NLP)

More CRN sites use EpicCare® than any other EMR system. EMRs allow researchers to manipulate and standardize free-text clinical data such as clinical assessment findings, image interpretations, pathology evaluations, hospital discharge summaries, and consultant evaluations. In addition to the standard physician user-interface, the EMRs also have a patient interface, where patients can view items in their medical record (such as visit summaries and laboratory test results), send secure messages to their physicians, and enter information into a health risk assessment survey or other survey instrument. This provides the CRN with opportunities for innovative interventions. Natural Language Processing (NLP) helps investigators to identify the variety of sentences, clauses, words, symbols, and abbreviations that represent synonyms for a concept of research interest. CRN informaticists developed an NLP tool called MediClass® to collect standardized information about tobacco control counseling in "Using Electronic Medical Records to Measure and Improve Adherence to Tobacco Treatment Guidelines in Primary Care."

Emerging Partnerships with caBIG™

The CRN is coordinating its informatics with NCI's Cancer Biomedical Informatics Grid (caBIG™) to facilitate collaboration. One project is identifying strategies for increasing enrollment in NCI clinical trials by linking the CRN to the cancer Text Information Extraction System (caTIES), an open-source NLP system available from the caBIG Web site. caTIES facilitates extraction, coding, and querying attributes referenced in free-text pathology reports. A standard, caBIG-compliant version of caTIES is planned for the CRN's VDW. The aim of the caBIG collaboration is to use caBIG tools to improve the VDW's compatibility and interoperability with national standards. Where reasonable and feasible, the CRN will contribute candidates for consideration in the caBIG Data Standards Repository.

The CRN also is an active contributor to the caBIG Population Sciences Special Interest Group and the crosscutting Data Sharing and Intellectual Capital Workspace. Both groups are working on strategies to facilitate multi-site collaboration, data collection, and stewardship -- topics that are well-aligned with the CRN's extensive experience.

Home | About CRN | Collaborating with CRN | Project Portfolio
Scientific & Data Resources | Dissemination | Related Links