Skip Navigation Genome.gov - National Human Genome Research InstituteGenome.gov - National Human Genome Research InstituteGenome.gov - National Human Genome Research InstituteNational Institutes of Health
   
       Home | About NHGRI | Newsroom | Staff
Research Grants Health Policy & Ethics Educational Resources Careers & Training

Home>About NHGRI>About the Office of the Director>Office of Population Genomics>OPG: A Catalog of Published Genome-Wide Association Studies >GWAS: Full Description of Methods
Print Version

A Catalog of Genome-Wide Association Studies

Full Description of Methods

Weekly PubMed searches are done using the terms "genome-wide" OR "genome AND identification" OR "genome AND association", with limits on the current year and human status. Studies focusing on copy number variants (CNV) are included but known to be incomplete. Information on the following study-level fields is extracted: author (last name of first author); study date (online publication date, if available); PubMed URL; publication title; disease/trait information; initial sample size (summing across multiple Stage 1 populations, if applicable); replication sample size (summing across multiple populations, if applicable); platform (manufacturer); number of SNPs passing quality control metrics (using "up to [maximum number of SNPs]" if multiple platforms are used without imputation, the total number of imputed SNPs, or "pooled" to denote studies of pooled DNA, as applicable); whether the study was one of copy number variants (initially excluded; additional studies to be added). For each identified SNP, we extract: chromosomal region (from UCSC Genome Browser); gene (as reported); rs number and risk allele; risk allele frequency in controls (if not available among all controls, among the control group with the largest sample size); p-value and any relevant text (e.g., subgroups where applicable); OR, 95% CI and any relevant text (e.g., subgroups). If the p-value, OR, and 95% CI fields are not available for the combined population, we extract estimates from the population group with the largest sample size.

In extracting information, we follow these additional guidelines: Missing or not applicable fields are denoted as follows: ?, allele not reported; NS, not significant (no associations at p<1.0 x 10-5 identified); NR, not reported; Where multiple genetic models are available, effect sizes (OR's or beta-coefficients) are prioritized as follows: 1) genotypic model, per-allele estimate; 2) genotypic model, heterozygote estimate, 3) allelic model, allelic estimate. Focusing on risk alleles, we invert ORs < 1 and their associated confidence intervals, and report the opposite allele if available. If 95% CIs are not published, we estimate them using standard errors where available. Associations attributed to a combination of one or more genetic variants are denoted as such in the strongest SNP-risk allele column (e.g., "rs1015362-G + rs4911414-T", "3-SNP haplotype 1"). If available, rs numbers for SNPs comprising the haplotype are indexed to be searchable using the SNP search features. Genes attributed to a SNP are extracted verbatim from the published report; "intergenic" and "NR" (not reported) are used to denote a location which was not attributed to a particular gene (if it appeared that location information was sought) or an absence of reporting on location information, respectively. Occasionally the term "pending" is used to denote one or more studies which we identified as an eligible GWAS but for which SNP information has not yet been extracted; studies of CNVs are also noted as pending.


Top of page

Last Reviewed: April 30, 2009



PrivacyCopyrightContactAccessibilitySite MapStaff DirectoryFOIAHome Department of Health and Human Services  National Institutes of Health  USA.gov