Genomic Resources
- Genomic Datasets for Cancer Research: Datasets and Access Policy
- Cancer Genome and Somatic Mutation Information
- Genotyping and Sequencing Centers
- Genome Browsers and Map Viewers
- Databases and Catalogues of Genetic Variation
- Toolkits for Harmonizing or Generating Standardized Measures for Phenotypes and Exposures
- Analysis Tools
- Catalogues and Databases of Relationships Between Genotypes and Phenotypes
- Tools for Predicting Impact of Amino Acid Substitutions
- Literature and Knowledge Base Resources
- NCI/NIH Sponsored Networks and Programs
Note: This list provides links to research resources that may be of interest to genetic epidemiologists conducting cancer research, but is not exhaustive. Within each section, the resources are listed in alphabetical order.
If you have suggestions for additional resources to add, please contact Carolyn Hutter, Ph.D., a Program Director in the Epidemiology and Genomics Research Program's Host Susceptibility Factors Branch.
Genomic Datasets for Cancer Research: Datasets and Access Policy
- Data Access Request Process
This page contains instructions for submitting a Data Access Request for dataset(s) under the purview of the NCI's Extramural Data Access Committee. - Database of Genotypes and Phenotypes (dbGaP)
dbGaP was developed to archive and distribute the results of studies that have investigated the interaction of genotype and phenotype. Such studies include GWAS, medical sequencing, molecular diagnostic assays, as well as studies of associations between genotype and non-clinical traits. dbGaP provides two levels of access, open and controlled, in order to allow broad release of non-sensitive data, while providing oversight and investigator accountability for sensitive data sets involving personal health information. - Genomic Datasets for Cancer Research
This page provides information on a variety of datasets from genome-wide association studies (GWAS) of cancer and other genotype-phenotype studies, including sequencing and molecular diagnostic assays. These data are available to approved investigators through the National Cancer Institute (NCI)'s Extramural Data Access Committee (DAC). - GWAS Policy Home Page
In January 2008, the National Institutes of Health (NIH) implemented a policy for the sharing of data obtained in NIH-supported or conducted GWAS. The purpose of the policy is to foster science for the benefit of the public through the creation of a centralized NIH GWAS data repository. This website supports the GWAS policy's implementation. - Notice on Development of Data Sharing Policy for Sequence and Related Genomic Data
This notice details NIH plans to: 1) updated data sharing policies for NIH supported research involving sequence and related genomic data; 2) encourage investigators and IRBs to consider the potential for broad sharing of this genomic data in developing informed consent processes and documents for such studies; and 3) communicate the agency's intent to develop a policy pertaining to the deposition of these large datasets into centralized databases.
Cancer Genome and Somatic Mutation Information
- cBio Cancer Genomics Portal
The cBio Cancer Genomics Portal, developed by the Computational Biology Center at Memorial Sloan-Kettering Cancer Center, provides visualization, analysis and download of subsets of large-scale cancer genomics data sets. - Catalogue of Somatic Mutations in Cancer (COSMIC)
The COSMIC database is designed to store and display somatic mutation information and related details and contains information relating to human cancers. - The Cancer Genome Atlas (TCGA)
TCGA is a comprehensive and coordinated effort to accelerate our understanding of the molecular basis of cancer through the application of genome analysis technologies, including large-scale genome sequencing. TCGA data are available to the research community for use in developing better ways of diagnosing, treating, and preventing cancer.
Genotyping and Sequencing Centers
- Cancer Genomics Research Laboratory (CGR)
NCI established the CGR to investigate the contribution of germline genetic variation to cancer susceptibility and outcomes. Working in concert with epidemiologists, biostatisticians and basic research scientists in the intramural research program, the CGR has developed the capacity to conduct genome-wide association studies and next-generation sequencing to identify the heritable determinants of various forms of cancer. - Center for Inherited Disease Research (CIDR)
CIDR provides high-quality next generation sequencing and genotyping services to investigators working to discover genes that contribute to common diseases. - Mendelian Genome Centers
These Centers, funded by the National Human Genome Research Institute (NHGRI) apply next-generation sequencing and computational approaches to discover the genes and variants that underlie Mendelian conditions, including certain forms of cancer. - National Human Genome Research Institute (NHGRI) Large Scale Sequencing Program
NHGRI funds large-scale genome sequencing capacity at several centers located in the U.S. This program undertakes sequencing projects to provide critical genomic information that can be of significant value to the scientific community in areas of very broad scientific interest.
Genome Browsers and Map Viewers
- Ensembl
Ensembl is a joint project between the European Bioinformatics Institute the Wellcome Trust Sanger Institute to develop a software system which produces and maintains automatic annotation on selected eukaryotic genomes. - National Center for Biotechnology Information (NCBI) Human Genome Resources
NCBI's website strives to offer an integrated, one-stop, genomic information resource for data emerging from the Human Genome Project and other sequencing projects worldwide. - NCBI Map Viewer
The Map Viewer provides graphical displays of features on the human reference genome sequence assembly maintained by the genome reference consortium and the alternate HuRef genome assembly, as well as cytogenetic, genetic, physical, and radiation hybrid maps. - The University of California, Santa Cruz (UCSC) Genome Browser
The UCSC Genome Browser contains the reference sequence and working draft assemblies for a large collection of genomes. This interactive website offers access to genome sequence data integrated with aligned annotations.
Databases and Catalogues of Genetic Variation
- 1000 Genomes Project
The goal of the 1000 genomes project is to provide a comprehensive resource on human genetic variation. The Project is sequencing the genomes of approximately 2,500 samples at 4x coverage, to provide data on genetic variants with frequencies of at least 1% in the populations studied. - Database of Genomic Structural Variation (dbVar)
dbVar is the NCBI central repository for structural variation. Structural variation is generally defined as any region of DNA involved in inversions and balanced translocations, insertions and deletions, or copy number variation. - Database of Single Base Nucleotide Substitutions (dbSNP)
dbSNP is the NCBI central repository for single base nucleotide substitutions (SNPs) and short deletion and insertion polymorphisms. - Encyclopedia of DNA Elements (ENCODE) data
ENCODE provides a comprehensive parts list of functional elements in the human genome, including elements that act at the protein and RNA levels, and regulatory elements that control cells and circumstances in which a gene is active. - International HapMap Project
The HapMap is a catalog of common genetic variants that occur in human beings. It describes what these variants are, where they occur in our DNA, and how they are distributed among people within populations and among populations in different parts of the world. - National Heart Lung and Blood Institute (NHLBI) Exome Variant Server (EVS)
The goal of the NHLBI GO Exome Sequencing Project (ESP) is to discover novel genes and mechanisms contributing to heart, lung and blood disorders by pioneering the application of exome sequencing across diverse, richly-phenotyped populations and to share these datasets and findings with the scientific community. The current EVS data release represents all variants identified from exome sequencing of 6503 ESP samples. - SNP500Cancer
SNP500Cancer provides a central resource for sequence verification of SNPs in genetic regions of importance to molecular epidemiology studies in cancer.
Toolkits for Harmonizing or Generating Standardized Measures for Phenotypes and Exposures
- Consensus Measures for Phenotypes and EXposures (PhenX)
PhenX, funded by NHGRI, is intended to integrate genetics and epidemiologic research. The toolkit is a web-based catalog of high-priority measures of phenotypes and exposures for use in GWAS and other research efforts. - Data Schema and Harmonization Platform for Epidemiological Research (DataSHaPER)
DataSHaPER is both a scientific approach and a suite of practical tools. Its primary aims are to facilitate the prospective harmonization of emerging biobanks, provide a template for retrospective synthesis and support the development of questionnaires and information-collection devices.
Analysis Tools
- Alphabetical List of Genetic Analysis Software
Curated at Rockefeller University, a list of computer software on the following topics: genetic linkage analysis for human pedigree data, QTL analysis for animal/plant breeding data, genetic marker ordering, genetic association analysis, haplotype construction, pedigree drawing, and population genetics. - Broad Institute Software Tools
Scientists in the Broad community have developed many critical software tools for the analysis of increasingly large genome-related datasets, and they make these tools openly available to the scientific community. Includes GATK and Haploview. - Genetic Simulation Resources (GSR)
This web tool provides a catalogue of existing computer simulation programs that simulate genetic data of the human genome for studies in population and evolutionary genetics, genetic epidemiology, and other relevant application areas. It contains computer programs that generate samples by simulating evolutionary processes backward (coalescent) or forward in time, resampling empirical data, or using other novel methods. This is for use for aid in selection of most appropriate genetic simulation tools for specific genetic epidemiology questions. - Genome Variation Server (GVS)
GVS provides information on allele frequencies, linkage disequilibrium, tagSNP selection and SNP summaries. Fed by a local database, GVS enables rapid access to human genotype data found in dbSNP, and provides tools for analysis of genotype data. - PLINK
PLINK is a free, open-source whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner. - SEQanswers Software List
Dynamic and comprehensive table of next-generation sequence analysis software compiled on the SEQanswers website. Includes programs that recalibrate the quality scores produced by next-generation sequencing base callers (ShortRead, SHREC, BING, GATK) and algorithms for DNA sequencing (BWA, MAQ, BFAST, SOAP, etc) - University of Michigan Software Tools
Scientists at the University of Michigan have developed software tools for statistical genetics analysis, and they make these tools openly available to the scientific community. Includes LocusZoom, MACH and the CaTS Power Calculator.
Catalogues and Databases of Relationships Between Genotypes and Phenotypes
- ClinVAR
This is a freely accessible, public archive of reports of the relationships among human variations and phenotypes along with supporting evidence. - Database of Chromosomal Imbalance and Phenotype in Humans Using Ensembl Resources (DECIPHER)
DECIPHER is an interactive web-based database which incorporates a suite of tools designed to aid the interpretation of submicroscopic chromosomal imbalance. This database collects clinical information about chromosomal microdeletions/duplications/insertions, translocations and inversions. - Human Gene Mutation Database (HGMD)
This database provides a comprehensive core collection of germline mutations in nuclear genes that underlie or are associated with human inherited disease. - NHGRI Catalog of Published Genome-Wide Association Studies
This resource provides information on SNP-trait associations abstracted from GWAS publications. - Online Mendelian Inheritance in Man (OMIM)
OMIM is a comprehensive, authoritative, and timely compendium of human genes and genetic phenotypes. The full-text, referenced overviews in OMIM contain information on all known Mendelian disorders and over 12,000 genes. OMIM focuses on the relationship between phenotype and genotype. It is updated daily, and the entries contain copious links to other genetics resources. - Phenotype-Genotype Integrator (PhenGenI)
PhenGenI merges NHGRI GWAS catalog data with several databases housed at the NCBI, including Gene, dbGaP, OMIM, Genotype-Tissue Expression (GTEx), and the Database of Single Nucleotide Polymorphisms (dbSNP). - wikiGWA
wikiGWA is a Wikipedia style platform for researchers to share their GWA findings.
Tools for Predicting Impact of Amino Acid Substitutions
- Polymorphism Phenotyping (PolyPhen-2)
This tool predicts the possible impact of an amino acid substitution on the structure and function of a human protein using straightforward physical and comparative considerations. - PMut
This software, aimed at the annotation and prediction of whether a mutation is pathological, formulates predictions with neural networks, using internal databases, secondary structure prediction and sequence conservation. - The Sorting Tolerant From Intolerant (SIFT) Algorithm
This tool predicts whether an amino acid substitution affects protein function based on the degree of conservation of amino acid residues in sequence alignments derived from closely related species. - Variant Effect Predictor
This system (formerly known as the SNP Effect Predictor) categorizes Ensembl genomic variants in known transcripts by their potential effect.
Literature and Knowledge Base Resources
- Cancer Genome-Wide Association and Meta Analyses database (Cancer GAMAdb)
Cancer GAMAdb provides a continually updated database containing key descriptive characteristics of each genetic association extracted from published GWAS and meta-analyses relevant to cancer risk. - Cancer Genomic Evidence-Based Medicine Knowledge Base (CancerGEM KB)
CancerGEM KB is a resource for researchers, public health professionals, policy makers, and health care providers who are interested in the use of genomic information in cancer care and prevention. - GeneReviews
GeneReviews are overviews providing expert-authored, peer-reviewed, current disease descriptions that apply genetic testing to the diagnosis, management, and genetic counseling of patients and families with specific inherited conditions. - HuGE Navigator
The Navigator is an integrated, searchable knowledge base of genetic associations and human genome epidemiology. - Pharmacogenomic Resources
This page provides links to pharmacogenomics collaborative opportunities, consortia, and networks; databases related to pharmacogenomics research; knowledge synthesis resources; reports; and toolkits. - SEQanswers
SEQanswers was founded to be an information resource and user-driven community focused on all aspects of next-generation genomics. The site aims to be a central location for next generation sequencing technology discussion and education. The site will always attempt to cater to everyone, regardless of scientific background or knowledge.
NCI/NIH Sponsored Networks and Programs
- Breast and Colon Cancer Family Registries (CFRs)
The CFRs are international research infrastructures for investigators interested in conducting population and clinic-based interdisciplinary studies on the genetic and molecular epidemiology of breast and colon cancer and their behavioral implications. - Cancer Genetics Markers of Susceptibility (CGEMS)
CGEMS was launched to identify common inherited genetic variations associated with risk for breast and prostate cancer. It involves genome-wide association studies (GWAS) for a number of cancers, and more recently, exposures and survival. The raw genotype data from each of the CGEMS projects will be available for download to accredited investigators, upon approval of a Data Access Request. - Cancer Genetics Network (CGN)
The CGN is a resource for investigators conducting research on the genetic basis of human cancer susceptibility; integration of this information into medical practice; and behavioral, ethical, and public health issues associated with human genetics. - Environmental Polymorphism Registry (EPR)
The EPR is a long-term research project to collect and store DNA from up to 20,000 North Carolinians in a biobank. The DNA samples are available to scientists to study variations in genes (known as polymorphisms) that might be linked to common diseases such as diabetes, heart disease, cancer, asthma and others. While many types of genes are studied as part of the EPR, the focus is on a category known as environmental response genes. - Genetic Associations and Mechanisms in Oncology (GAME-ON)
GAME-ON comprises five NCI sponsored cooperative agreements for transdisciplinary research projects addressing two overall goals: 1) To pursue promising scientific leads from previously generated GWAS of cancer; and 2) To coordinate and accelerate integrative post-GWAS discovery research, which could provide the basis for expediting clinical translation and public health dissemination of the findings.