Accessing HCMI Data

The Human Cancer Models Initiative (HCMI) is a community resource of next-generation cancer models, derived from parent tumors which span a range of cancer subtypes. The models also inlcude those derived from individuals of diverse ethnic and racial backgrounds as well as from rare adult and pediatric cancers. The models and their associated normal and parent tumor tissues are annotated with clinical, biospecimen and molecular characterization data.

The user guide explains how to access the HCMI data.​

About the Data

A model's case-associated data include data from the derived-model, originating tumor tissue and normal tissue. The clinical, biospecimen, and molecular characterization data from HCMI cancer models, matched normal, and tumor tissues are quality-controlled at each step of the cancer model development pipeline. The quality-controlled and harmonized data are available at NCI’s Genomic Data Common (GDC).

Open- vs. Controlled-Access

Open- vs. controlled-access is defined by the NIH data sharing policy. The HCMI follows the NIH’s human subjects’ protection and data access policies to ensure the privacy and confidentiality of the research participants. HCMI data are available to the scientific community in two tiers: open- or controlled-access. Both types of data can be accessed through the GDC.

Open-access Data
Open-access data presents minimal risk that a participant can be identified. HCMI provides the scientific community the maximum amount of open-access data allowable under HIPAA guidelines. Access to this data does not require data use certification.

Examples of open-access data are:

  • De-identified clinical information
  • Biospecimen data including tissue pathology
  • Tumor- and model-associated somatic mutations
  • Gene expression data

Controlled-access Data
Controlled-access data is stripped of direct participant identifiers as defined by HIPAA. Controlled-access data contains genomic information that could identify the patient.

Examples of controlled-access data are:

  • Raw sequencing data for WGS, WXS or RNA-Seq
  • Harmonized datasets which contain germline variants
  • Infinium MethylationEPIC

Access to this data requires user certification which can be obtained through NCBI’s dbGaP (National Center for Biotechnology Information’s database of Genotypes and Phenotypes). Researchers may apply for dbGaP access by filling out a Data Access Request form. Read “How to Access Controlled Data” below for more information.

How to Access Controlled-Data

Obtain Data Use Certification (DUC) through dbGaP

Note: NCI intramural investigators must submit a dbGaP account activation request before submitting a DAR. Contact the NCI Office of Data Sharing for instructions.

  • All investigators must have an NIH eRA Commons account or HHS credentials (intramural investigators only).

Get Help If You Have Trouble Accessing Data

Last updated: December 07, 2020