nci & cit logo Visualizing the Anatomy of Prostate Cancer:
The CGAP Molecular Profiling Initiative



L. Pusanik (CIT/SAIC), N. Cao (CIT), J. Pfeifer (CIT),
M. Ahram (NCI), C. Best (NCI), J. Gillespie (NCI), J. Swalwell (NCI),
P. Duray (NCI), M. Linehan (NCI), C. Lanczycki (CIT), M. Emmert-Buck (NCI)

Introduction
The National Cancer Institute created the Cancer Genome Anatomy Project (CGAP) to achieve a comprehensive molecular characterization of normal, precancerous, and malignant cells in order to identify the genes responsible for the establishment and growth of cancer (1).  The Molecular Profiling Initiative (MPI) was defined by CGAP to highlight key issues, experimental methods and research examples of molecular profiling studies of clinical specimens (2).  The MPI seeks to measure global mRNA and protein patterns to identify those genes that mediate particular aspects of cellular physiology and pathology, using prostate cancer as an initial model for study.   Many modern techniques must be used to achieve this aim, including high-throughput microarray screening for gene expression, proteomics-based methods to identify protein profiles, and laser capture microdissection (LCM) to isolate homogeneous cell populations. These techniques each produce a tremendous amount of data. The basic challenge is to tie together the molecular, image, and clinical data from these new methodologies to better understand the large-scale histological structure observed in diseased states.

Project Goals
Our long-term agenda is to create a suite of software modules that permits the navigation of complex, heterogeneous multi-dimensional data in ways that provoke scientific insights.  The creation of these modules will result in a flexible set of research tools for storing, analyzing, and visualizing pathogenetic data.

Data Flow Diagram
Figure 1. Data Flow.

Methods - The Software
Molecular profiling studies can be represented as having the flow of data schematically shown in Figure 1.  Order-of-magnitude estimates of the data volume appear in Table 1 to illustrate the significant informatics component of molecular profiling research.  To store the minimal set of information on a single patient, roughly ninety Mb must be collected and managed by the pathologist; in a more typical case this will increase and potentially reach many Gb of data.  Because of the complexity and quantity of data involved in the MPI, the software suite has been divided into four GUI-based client applications.

Estimated Data Quantities Per Patient Table

next page icon