skip to content
National Cancer Institute U.S. National Institutes of Health www.cancer.gov
About DCEG

Barry I. Graubard, Ph.D.

Senior Investigator

Location: Executive Plaza South, Room 8024
Phone: 301-496-7455
Fax: 301-402-0081
E-mail: graubarb@mail.nih.gov

Barry I. Graubard, Ph.D.

Biography

Dr. Graubard received a Ph.D. in mathematics from the University of Maryland in 1991. He began his career as a mathematical statistician at the National Center for Health Statistics in 1977, and held research positions at the Alcohol Drug Abuse and Mental Health Administration and the National Institute of Child Health and Human Development. Dr. Graubard joined the NCI in 1990. He received the American Statistical Association and Biometric Society Snedecor Award for Applied Statistical Research in 1990, and he is a Fellow of the American Statistical Association.

Research Interests

National health survey data are used for many purposes by the NCI, including cancer surveillance as well as descriptive and analytical epidemiology studies. Surveys provide national and subgroup estimates of the prevalence of cancer risk factors, and subjects surveyed can be followed as nationally representative cohorts for estimating associations between risk factors and cancer incidence. When analyzing data from national surveys, attention needs to be given to their complex sample designs. These designs often use multiple stages of cluster sampling to obtain survey subjects, and require sample weighting to make the survey data representative of the target population. Our collaborations with biomedical researchers have led us to develop statistical methods for using national health survey data in addressing issues in cancer etiology and surveillance.

Survey Methods Research

We developed methods for efficiently testing regression parameters for data from surveys with highly inefficient sample designs. These designs can have widely variable sample weights resulting in much larger standard errors than one would obtain from a simple random sample of the same number of sampled subjects. In addition, many national surveys have limited degrees of freedom for estimating standard errors, because of small numbers of first stage sampled clusters. Our methods involve augmenting the regression model with independent variables that determine the sample weights. This approach models the effect of the sample weighting without explicitly weighting the regression analysis. To address the limited degrees of freedom, we base the variance estimation on the more numerous clusters at higher stages of sampling, which results in more degrees of freedom.

We also conducted research into other statistical methods for analyzing survey data, including:

(1) graphing survey data with local linear kernel density smoothing adapted for weighted data and developing jackknife methods for estimating the pointwise standard errors for mean smoothed curves, (2) generalized direct standardized estimation for linear and nonlinear regression models in which adjusted treatment effects are standardized to a distribution of the covariates and estimated design-based standard errors, (3) Wald tests for goodness-of-fit for logistic regression models that use the F-distribution and a Monte Carlo simulated distribution, and (4) estimating variances for superpopulation parameters.

Dr. Graubard and Dr. Edward Korn of NCI's Division of Cancer Treatment and Diagnosis, have written a graduate-level textbook entitled "Analysis of Health Surveys" which provides a compilation of practical statistical techniques for use in analyzing health survey data.

Biostatistical Methodology

Correlated observations from cluster samples occur in meta-analyses where each study or experiment is a cluster, and in nonrandomized community studies where the community is the cluster. We developed statistical methods to address this correlation in meta-analyses and community studies. For a meta-analysis of animal experiments that tested for the effect of dietary fat and total caloric intake on mammary tumorigenesis, we developed sandwich estimates of variance for conditional logistic regression which were robust to model misspecification. We are developing statistical methods for analyzing changes in the prevalence of smoking between states (where the state is the cluster) that did or did not receive resources to promote smoking cessation in the nonrandomized American Stop Smoking Intervention Study (ASSIST). These methods include variance estimation for nonparametric smoothing of tobacco sales data that use the bootstrap techniques and regression methods involving random effects models with time dependent covariates to estimate the effectiveness of ASSIST in reducing tobacco consumption and prevalence.

Epidemiologic Collaboration

We collaborate on the design and analyses of a wide range of epidemiologic studies. We are working with NCI investigators on these issues in a study to evaluate the accuracy of reporting of cancer in first and second degree relatives. The sensitivity and specificity of the reporting will be estimated using the Connecticut cancer registry, records from the Health Care Financing Administration, and personal medical records to validate reports about family members from a population random sample of individuals living in Connecticut. Analyses from the NHANES I Epidemiologic Followup Study cohort are being conducted to examine associations between physical activity and the incidence of breast cancer, the intake of aspirin and total mortality, and cancer mortality and cardiovascular mortality. Analyses of data from participants in the Breast Cancer Detection and Demonstration Project found that women in the upper 25% of diet quality had about a 30% reduction in mortality.

Keywords

complex survey data, nutritional data, statistical methods, tobacco control

Selected Publications

  • Li Y, Graubard BI. Testing Hardy-Weinberg equilibrium and homogeneity of Hardy-Weinberg disequilibrium using complex survey data. Biometrics. (In Press).
  • Graubard BI, Korn EL. Edited CR Rao and D Pfeffermann. Handbook of Statistics No. 29 Sample Surveys: Theory, Methods and Inference. Chapter 37. Scatter plots with survey data. Elsevier. (In Press).
  • Flegal KA, Graubard BI, Williamson DF, Gail MH. Cause-specific excess deaths associated with underweight, overweight and obesity. JAMA. 2007; 298:2028-37.
  • Garceau A, Wideroff L, McNeel T, Dunn M, Graubard BI. Population estimates of family structure and size. Community Genetics. 2008; 11:331-342.
  • Hunsberger S, Graubard BI, Korn EL. Testing logistic regression coefficients with clustered data and few positive outcomes. Stat Med. 2008; 27:1305-24.
  • Graubard BI, Flegal KA, Williamson DF, Gail MH. Estimation of attributable number of deaths and standard errors from simple and complex sampled cohorts. Stat Med. 2007; 26:2639-2649.
  • Graubard BI, Gilpin EA, Hartman AM, Murray DM, Davis W, Gibson JT, Stillman FA. Chapter 9: Analytic methods and results of the ASSIST Evaluation. NCI Monograph. 2006.
  • Graubard BI, Fears TR. Standard errors for attributable risk for simple and complex sample designs. Biometrics. 2005; 61:847-55.
  • Graubard BI, Rao RS, Gastwirth JL. Using the Peters-Belson Method to measure health care disparities from complex survey data. Stat Med. 24:2659-68.

Collaborators

DCEG Collaborators

  • Nilanjan Chatterjee, Ph.D.; Mitchell Gail, M.D., Ph.D.; Joseph Gastwirth, Ph.D.; Susan Devesa, Ph.D.; Katherine McGlynn, Ph.D.; Mary Ward, Ph.D.; Tara Vogt, M.P.H.; Regina Ziegler, Ph.D.

Other NCI Collaborators

  • Rachel Ballard-Barbash, M.D.; William Davis, Ph.D.; Michael Fay, Ph.D.; Michele Forman, Ph.D.; Anne Hartman, M.S.; Ernest Hawk, M.D.; Edward Korn, Ph.D.; Blossom Patterson, Ph.D.; Francis Stillman, Ph.D.

Other NIH Collaborators

  • Vivian Faden, Ph.D., National Institute on Alcohol Abuse and Alcoholism

Other Scientific Collaborators

  • Rosalind Breslow, Ph.D., Centers for Disease Control and Prevention, Atlanta, GA
  • Chan Dayton, Ph.D., University of Maryland, College Park, MD
  • Ahmedin Jemal, Ph.D., D.V.M., American Cancer Society, Atlanta, GA