Cancer Genome Anatomy Project (CGAP) – Genetic Annotation Initiative
INTRODUCTION
The Genetic Annotation Initiative is an effort aimed at enhancing the genetic utility of NCI’s CGAP sequence data. Within this initiative a systematic search for DNA-based variation will be conducted in genes important in cancer and its related phenotypes. The NCI is seeking collaborators to assist in assessing efficient and effective technological approaches to finding and characterizing this variation.
OPPORTUNITIES FOR COLLABORATION
The NCI is interested in working with groups to explore promising technologies in variation detection or characterization. It is not yet clear what represents the best technical approach to routinely identify DNA variation (discovery). Similarly, it is still unknown what represents the best approach for high-throughput genotyping of that variation once discovered (characterization). Below is outlined a possible structure for collaboration targeted toward technologies which are related to the discovery and characterization of DNA-based variation.
Discovery Technology Prototyping: Partners with polymorphsim discovery technologies will be invited to examine a collection of targets selected from the NCI reference set. The partner will apply their discovery method to screen a subset of CEPH reference pedigree panel samples used in the NCI discovery and validation protocol (see below). However, alternative formats for variation discovery in which such testing is unwarranted will also be considered for prototyping. The partners will be asked to provide information to the consortium that includes, but is not limited to, the detailed protocol used in the testing, the discovery outcomes, and the costs. The discovery outcomes should include the efficiency in recovering the reference variants, efficacy in identifying new variants not previously captured, the rate of false positives, and the rate of false negatives. Costs should be expressed as absolute costs, incremental costs (ongoing costs subtracting fixed instrumentation), and costs per variant identified (both absolute and incremental). With mutual agreement of all parties, the opportunity for joint publication of the resulting outcomes is possible.
Characterization Technology Prototyping: Partners with high-throughput characterization technologies will be invited to test known variants from the NCI reference collection. The partner will apply their characterization method to the CEPH reference pedigree panel samples used in the NCI discovery and validation protocol (see below). Partners will be asked to genotype the two families used in the validation experiments (a total of 26 samples). Genotypes will be made publicly accessible for the additional families in the eight Genethon/CHLC (PLT) and the seven CHLC (SLT) subsets used for reference map construction. DNA identified by CEPH ID# from the PLT set will be provided to the partner to facilitate calibration (96 samples). DNA blinded to CEPH ID# from the SLT set will be provided for the purpose of assessing genotyping accuracy (96 samples).
Characterization partners will be asked to provide information to the consortium that includes, but is not limited to, detailed protocols used in the testing, the characterization outcomes, and the costs. The outcomes should include the efficiency in recovering the reference variants in the sample, the rate of false positives, and the rate of false negatives. Costs should be expressed as absolute costs, incremental (ongoing costs subtracting fixed instrumentation), and costs per variant identified (both absolute and incremental). With mutual agreement of all parties, the opportunity for joint publication of the resulting outcomes is possible.
Models of Participation: Three different levels of interaction are envisioned:
Level 1: small scale, proof of concept experiments, conducted by the partner, using reagents provided by the NCI and aimed at recovering the reference information described above.
Level 2: larger scale efforts, conducted by the partner, using reagents provided by the NCI in which high-throughput capacity and cost-efficiency is demonstrated.
Level 3: large scale efforts, conducted within NCI laboratories, where the capacity to transport the technology off-site is demonstrated.
Depending on maturity of the technology, the interaction could take place at different levels, or stop short of completing all levels. Only the technologies showing significant potential will be imported into the NCI laboratories (at the discretion of the NCI staff and dependent on the negotiated structure of technology transfer). The NCI is prepared to provide the following reagents: amplimer sequence, primer sequence for the amplimer, the DNA variation site (and its alternative forms), genotypes for the CEPH panel, primer aliquots, and/or CEPH DNA aliquots.
NCI ESTABLISHED REFERENCE SET
Intramural laboratories within the NCI are working to establish a "gold standard" set of DNA-based variation which can be used to calibrate the various discovery and characterization techniques. To establish this set sequencing technology is being used to look at multiple STS’s defined in the 5’UTR’s, 3’UTRs, and rtPCR-cDNA products (from lymphoblastoid cell lines) in 1000 different cancer-relevant genes. It is projected that the effort will identify approximately 3000 DNA variants.
The reference set of polymorphisms is being generated using four samples (1331-01, 1331-02, 1413-01, 1413-02) from the Centre Etude Polymorphism Humain (CEPH) reference pedigree panel. PCR amplified STS products of 600 bps are bi-directionally sequenced using standard fluorescence-based technology. Variants are determined from chromatographs by visual inspection and computer tools. Putative variants are validated by testing for Mendelian transmission in the offspring samples from the two families. Genotyping of the variants is being performed on an additional 13 CEPH families. These include the remaining six Genethon/CHLC (PLT) and the seven CHLC (SLT) CEPH family subsets used for reference map construction. Accuracy is determined by examining variants in the context of multilocus haplotypes. Variants and genotypes identified through this initiative will be placed into the public domain to facilitate the work of other groups.