U.S. National Institutes of Health

Smoothing-Based Approaches for Estimating the Risk of a Disease by Quantile-Categories of a Predictor Variable

When one collects data on a prospective cohort, the standard method is simply to categorize the key predictor variable by the empirical quartiles. One may then include indicator variables for these empirical quartile-categories as predictors, along with other covariates, in a generalized linear model (GLM), with the observed health status of each subject as the response. The standard GLM method, however, is relatively inefficient because it treats all observations that fall in the same quartile-category of the predictor variable identically, regardless of whether they lie in the center or near the boundaries of that category.

Alternatively, one may include the key predictor variable, along with other covariates, in a generalized additive model (GAM), again with the observed health status of each subject as the response. The alternative GAM method non-parametrically estimates the functional relationship between the key predictor variable and the response. One may then compute statistics of interest, such as proportions and odds ratios, from the fitted GAM equation using the empirical quartile-categories. Simulations show that both the GLM and GAM methods are nearly unbiased but that the latter method produces smaller variances and narrower bootstrap confidence intervals. This work by BRB’s Dr. Albert was motivated by collaborative work on NCI’s Polyp Prevention Trial.

Borkowf CB, Albert PS. Efficient estimation of risk of a disease by quantile-categories of a predictor variable using generalized additive models. Stat Med 2005:24;623–45.

In case-control studies of genetic epidemiology, participating subjects (probands) are often interviewed to collect detailed data about disease history and age-at-onset information in their family members. Genotype data are typically collected for the probands. In this article, Dr. Shih and collaborators consider an approach that utilizes family history data of the relatives. They used the methods for estimation of risk of breast cancer from BRCA1/2 mutations using data from the Washington Ashkenazi Study.

Chatterjee N, Zeynep K, Shih JH, Gail M. Case-control and case-only designs with genotype and family history data: estimating relative-risk, familial aggregation and absolute risk. Biometrics [Epub Oct 20 2005].