skip to content
National Cancer Institute U.S. National Institutes of Health www.cancer.gov
Biostatistics Branch. Developing statistical methods for epidemiology and collaborating on epidemiologic studies.

Power for Genetic Association Analyses (PGA)

Introduction:

PGA is a package of algorithms and graphical user interfaces developed in Matlab for power and sample size calculation in case-control genetic association studies. The software comprises a wide verity of genetic models and statistical constraints and hence may facilitate decision making for case-control association studies of candidate genes, fine-mapping studies, and whole-genome scans.

Download and installation:

  • The PGA software is available at: http://dceg.cancer.gov/bb/tools/pga
  • To install the PGA, save the pga.exe file in an appropriate folder on your disk. Click on it, to extract the folder to a designated location on your hard drive.
  • Users without Matlab software should install first the MATLAB Component Runtime (MCR) on their computers. To install the MCR component, double-click on the ‘MCRInstaller.exe’ file and follow the installation instructions. The MCRInstaller can be downloaded here
  • Ensure that the MCR is installed on your computer in C:\Program Files\MATLAB\MATLAB_Component_Runtime\v76 or in the folder you selected in the installation process. Once the MCR is installed, you can download and run the different PGA stand-alone GUIs (pga1.exe, pga2.exe and edf.exe.

** The MCR is needed to be installed only once.

Software description:

PGA1:

PGA1 calculates and plot the relation between statistical power and sample size for a variety of genetic and statistical parameters. The user can determine the following parameters:

  • Type of genetic variation – SNP or Haplotype
  • Genetic mode of inheritance – Recessive, Dominant, Co-dominant(1df) or Co-dominant(2df).
  • Relative risk (RR) – The relative risk of the disease predisposing alleles. A second relative risk (RR2) is applicable only in Co-dominant model with 2 degrees of freedom (2df).
  • Linkage disequilibrium (LD) – The linkage disequilibrium value (in the form of r2 or D’) between the causative SNP and the genotyped marker.
  • Disease prevalence.
  • Disease allele frequency - the frequency of the disease predisposing allele.
  • Marker allele frequency – the allele frequency of the genotyped marker.
  • Effective degrees of freedom (EDF) - accounts for multiple testing in the study.
  • Alpha (Type I error).
  • Control to Case ratio.
  • Maximum sample size – the maximum number of cases to be considered in the calculations
  • ** The parameter values in the legend of the graph are ordered according to their order in the GUI.

PGA2:

PGA2 calculates and plots the minimum detectable relative risk (MDRR) for genotyped markers with different allele frequencies. The user can determine the following parameters:

  • Type of genetic variation – SNP or Haplotype.
  • Genetic mode of inheritance – Recessive, Dominant, Co-dominant(1df) or Co-dominant(2df).
  • Relative risk ratio (RR1/RR2) – The ratio between the two relative risks of the disease predisposing genetic alleles. This parameter is applicable only in the Co-dominant(2df) model.
  • Linkage disequilibrium (LD) – The linkage disequilibrium value (in the form of r2 or D’) between the causative SNP and the genotyped marker.
  • Effective degrees of freedom (EDF) - accounts for multiple testing in the study.
  • Disease allele frequency - the frequency of the disease predisposing allele.
  • Case number – the number of cases in the study.
  • Control to Case ratio.
  • Disease prevalence.
  • Power (1- Type II error).
  • Alpha (Type I error).
  • Maximum sample size – the maximum number of cases to be considered in the calculations.

** The parameter values in the legend of the graph are ordered according to their order in the GUI.

EDF:

EDF calculates the effective degrees of freedom for a particular set of SNP in linkage disequilibrium. It accepts SNP genotype data files from Hapmap (http://www.hapmap.org)or tab-delimited text files with SNP genotypes (in columns) coded as 0/1/2 for major homozygous, heterozygous and minor homozygous respectively and missing data encoded as NaN. Please see example input files(hapmap_example.txt, genotype_example.txt). From these data, it computes a summary measure of the EDF (Nyholt et al., 2004) and produces a map of linkage disequilibrium patterns (r2) for the SNPs in the dataset. It allows filtering SNP according to their allele frequencies by determining a threshold of minor allele frequency (MAF).

The resulted EDF value can be incorporated into PGA1 and PGA2 computations.