caIntegrator: Home

Rembrandt Application

REMBRANDT Home

Welcome to REMBRANDT (Repository of Molecular Brain Neoplasia Data)

This site, a joint initiative of the NCI and NINDS hosts diverse types of molecular research and clinical trials data related to brain cancers, including gliomas, along with a wide variety of web-based analysis tools that readily facilitate the understanding of critical correlations among the different data types.

Our aim is to be the access portal for a national molecular, genetic, and clinical database of several thousand primary brain tumors that is fully open and accessible to all investigators (including intramural and extramural researchers), as well as the public at-large. The data can be downloaded as raw files containing all the information gathered through the primary experiments or can be mined using the informatics support provided. This comprehensive brain tumor data portal will allow for easy ad hoc querying across multiple domains, thus allowing physician-scientists to make the right decisions during patient treatments

Our main focus is to molecularly characterize a large number of adult and pediatric primary brain tumors and to correlate those data with extensive retrospective and prospective clinical data. Specific data types hosted here are gene expression profiles, real time PCR assays, CGH and SNP array information, sequencing data, tissue array results and images, proteomic profiles, and patients? response to various treatments. Clinical trials? information and protocols are also accessible. This comprehensive brain tumor data portal will allow for easy ad hoc querying across multiple domains, thus allowing physician-scientists to make the right decisions during patient treatments.

This portal was designed specifically to facilitate the collaborative efforts of the brain cancer research community and thus, to expedite the development of targeted therapies. Your use of its features will make that goal a reality!

About

Motivation and background

Glioma Molecular Diagnostic Initiative’s (GMDI) primary goal is to develop a molecular classification schema that is both clinically and biologically meaningful, based on gene expression and genomic data from tumors (Gliomas) of patients who will be prospectively followed through natural history and treatment phase of their illness. Secondary objective of this study is to explore gene expression profiles to determine the responsiveness of the patients and correlate with discrete chromosomal abnormalities.

The Rembrandt Initiative is an informatics project to support the GMDI effort led by the NCI’s Center for Cancer Research (CCR) Neuro-Oncology Branch. The NCI Center for Bioinformatics (NCICB) has been developing tools and infrastructure components that facilitate the capture and integration of data associated with the Rembrandt study in support of translational research. National Institute of Neurological Disorders and Stroke (NINDS) is a leader in the neuroscience community in shaping the future of brain cancer research and is contributing resources and capabilities to this effort. Together, the CCR, NCICB, and NINDS can exploit existing technologies and develop new integrative components supporting ongoing and future cancer research initiatives towards understanding the “molecular signatures” of different brain tumor sub-types.

REMBRANDT Knowledgebase

Protocols

Process

Samples are collected from patients enrolled in the GMDI study. The following illustrations indicate what samples are being collected and how they are processed.

samples collected
Process

Sample Processing

DNA Analysis

RNA Analysis

Data Processing

Gene expression data was collected using both Affymetrix and cDNA array platforms.

For the Affymetrix experiments, Human Genome U133 Plus 2.0 Arrays were used to hybridize tumor samples collected from patients with brain tumor. The data from the hybridizations were used to compare the levels of expression between normal and brain tumor samples.>/p>

For data preprocessing, the probe level data was consolidated into probseset data using Affy MAS5 algorithm, with target scaling value at 500. We also processed probe level data with custom CDF (Chip Definition Files) that rearrange Affymetrix probes into gene-based probe sets. Probes mapped to alternatively spliced exons were grouped into a distinct probesets. The most 3` probes were selected for processing. Non-specific probes were masked before processing.

Single tumor sample to normal pool and sample average (samples were averaged based on tumor subtypes into 6 categories, Glioblastoma Multiforme, Oligodendroglioma, Astrocytoma, Mixed, Unclassified and Unknown tumors) to normal pool comparison were performed. The group comparisons were performed in R with two sample t tests. The signal values were first transformed to logarithm (base 2). The averages of the log2-signals of tumor and normal groups were computed. The magnitude of the differences between the geometric means of expression levels for each reporter from the two groups was computed. The significance of the differences between tumors (or each subtype of tumor) and the normal samples for each reporter was also evaluated.

For each individual tumor sample, signals for each tumor and ratio between each tumor and average of normal (geometric means, computed the same way as described above) were computed. Affymetrix data analysis workflow is briefly shown below.

All the processes were performed separately for various data groups (public data and institution-based data).

GMDI data processing for Affymetrix platform

Home-grown “Glioma Microarray chip” developed in conjunction with collaborators in the Cancer Genome Anatomy Project (CGAP) containing approximately 50,000 IMAGE clones that are of relevance to tumor development, was used. The arrays were scanned and processed using GenePix image analysis software and had been previously normalized (Lowess normalization) and filtered, thus contains missing values.

Consolidation of Data from Multiple Spots with Same Clones:

For clones with multiple spots (well ids) on the chip, a correlation-based approach was used to obtain a consolidate expression value for each clone. First, Pearson correlation between expression measurements (log2-ratios) for one spot (with a unique well id) for all arrays and those for another spot with the same clone was computed. If the correlation was above a threshold (e.g. 0.7), then, for each array, the average of the expression measurements among these spots was computed to represent the expression value for that clone. If the correlation is under a threshold (e.g. 0.7), the expression measurements between the spots from the same clone were considered to be inconsistent, and an "inconsistent" call was made and no final expression values were provided for that clone. If there were more than two spots from the same clone on the chip, pair-wise correlations between expression values (log2-ratios) of different spots (with different well ids) from the same clone were computed. Currently, if any of the correlation was under the threshold, an "inconsistent" call was made; otherwise the average of the expression measurements was computed to represent expression for that clone for each array. cDNA array data processing workflow to handle replicates is briefly shown below.

Handling of replicates for GMDI data from cDNA array GenePix platform

Missing data handling:

For single array/tumor values, if a value (log2-ratio) was missing in the input data, then ratio between the tumor and normal for that clone will be missing.

For the group comparison, for each clone, the available (non-missing) data was used to compute the average and p-value. If for a particular tumor subtype, the data was missing for all of the arrays in that subtype for a particular clone, then the average and p-value will both be missing in the final result.

Computation of p-values is done with one sample t test, including the missing values were handled.

LPG/Unified Gene Algorithm was developed by NCI’s Laboratory of Population Genetics. The algorithm provides a gene-based view of the expression data. To obtain the unified gene expression values, the probe-level data is processed with custom CDF (Chip Definition Files) that rearranges Affymetrix probes into splice-form based probesets. Probes mapped to alternatively spliced exons are grouped into a distinct probeset. The most 3` probes are selected for processing. Non-specific probes are masked before processing.

To obtain Copy number information, the tumor samples were hybridized to Affymetrix 100K SNP arrays. The CHP files from the Affymetrix Gene Chip Operating System were processed using the GDAS3.0 (GeneChip® DNA Analysis Software)and CNAT (Copy number Analysis Tool2.1). The copy number data was collected for each mapping SNP reporter on the Chip, for all the tumor samples.

The clinical data was collected by electronic data management system hosted by MD Anderson. The reports were created by querying that database and exported as flat files. The files were loaded into the stage area of the caIntegrator data warehouse. After going through some data quality checks as well as required transformation, the data was loaded into the data warehouse tables and available via the Rembrandt data portal.

Devdocs

Rembrandt v1.5 Release Notes

Refer to the caIntegrator developers section.

News

March, 2007	REMBRANDT version 1.5 has been released. v1.5 Release Notes
February, 2006	REMBRANDT version 1.0 has been released
June, 2005	The REMBRANDT Project has been chosen as a 2005 Service to America finalist!
April, 2005	REMBRANDT version 0.51 has been released
March, 2005	First Public announcement of the REMBRANDT project during caBIG press conference

Collaborators

in collaboration with: