Centers for Disease Control and Prevention


About N C H S graphic and link
Información en Español
Fastats A-Z provides health statistics and links to additional sources of information
N C H S help graphic and link
Coming Events graphic and link
Surveys and Data Collection Systems graphic and link
National Health and Nutrition Examination Survey graphic and link
National Health Care Survey graphic and link
National Health Interview Survey graphic and link
National Immunization Survey graphic and link
Longitudinal Studies of Aging (LSOAs)
National Survey of Family Growth graphic and link
State and Local Area Integrated Telephone Survey graphic and link
National Vital Statistics System graphic and link
Initiatives graphic and link
Aging Activities graphic and link
Disease Classification graphic and link
Healthy People graphic and link
Injury graphic and link
Research and Development graphic and link
Research Data Center
NCHS Press Room
News Releases graphic and link
Publications and Information Products graphic and link
Statistical Export and Tabulation System
Listserv graphic and link
Graphic and link to FEDSTATS and other sites
Download graphic
Adobe Acrobat Reader graphic and link
PowerPoint Viewer 2003 graphic and link
National Center for Health Statistics 3311 Toledo Road Hyattsville, Maryland 20782
Toll Free Data Inquiries 1-800-232-4636

 

 

 

 

 

CDC Home Search Health Topics A-Z
N C H S home page graphic and link

Frequently Asked Questions for the 1999-2004 Dual Energy X-ray Absorptiometry (DXA) Multiple Imputation Data Files
 

Question 1:

If we use only a subgroup for our analysis (i.e., the group with measured DXA data or those with only a single imputed value) will there be any issues with the weights that might affect this group disproportionately?

 

Answer:

If you use a single imputation you will have a complete dataset but the sampling errors will be underestimated because you will not have included the imputation variance. You also subject yourself to a reviewer asking if you possibly "selected" the one out of 5 imputation datasets that best fits your analysis. NCHS recommends using all 5 imputations. If you try to analyze only the cases with measured DXA data, then you have a classic missing data problem - the weights as provided are not appropriate as they are not adjusted for DXA non-response and DXA non-response is clearly NOT missing at random. Because the missingness is directly related to the outcome, you should expect the cases with complete data to differ from the cases with imputed values. For example, the percentage with a BMI greater than 30 will be higher for imputed cases than for complete cases. If this wasn't the case, NCHS probably would have simply re-weighted the data for non-response rather than go through the imputation process. NCHS does not recommend analysis based on only complete cases with original sampling weights.

 

 

Question 2:

Should we expect any major changes as we move from unweighted to weighted data analysis?

 

Answer:

No more so than usual. Sample weights vary by age, sex, race/ethnicity as well as those variables included in the non-response weighting adjustment. So, for example, if your outcome variable differs by race/ethnicity then you could expect differences in the weighted and unweighted measures. For multiple imputation datasets, the sample weights do not change from imputation to imputation.

 

 

Question 3:

Are there statistical outliers that will be resolved through weighting?

 

Answer:

Outliers are not "resolved" through weighting unless the weighting specifications call for trimming adjustments based on values of the outcome variable. As NHANES is a multi- purpose survey, a trimming adjustment for one variable may not be appropriate for other variables. As with any NHANES dataset, there will be influential data values and influential sample weights. You can run a cross-tab, or generate a scatter diagram to identify any observations that have BOTH a large weight and a large value. If you, as the analyst, feel that such observations are overly influential, you can delete, trim, or otherwise adjust. In the imputation model, influential observations were examined carefully, but none were removed from the model fitting exercise. The imputed values also were examined; some of the outer percentile points in the dataset are imputed values because either the average imputed value was large as predicted by the model or both the average value and the imputation variance were large. In a statistical sense, because you are dealing with 5 imputations, large imputed values were left in the dataset because they accurately reflect the imputation variance. Early in the modeling it was noted that some transformations were generating values that seemed to be too large (or too small); so transformations were avoided that seemed to elongate the tails of the distribution of imputed values.

 

 

Question 4:

Does the data set contain ONLY imputed data? That is, if a sample person had a measured value, was that measured value replaced by an imputed value or does the data set contain some measured values and some imputed values?

 

Answer:

DXA data items/variables were imputed ONLY for participants whose observations on the DXA variable were invalid or missing. If a participant’s DXA data are complete, then no imputations were done on the DXA data. Any individual DXA value and any summary statistics on the DXA dataset for those with measured data only should not change among the 5 datasets. For these participants each of the 5 data records will be identical. You should be able to verify this with a few line listings of the DXA records for sample persons with the same SEQN across the 5 datasets. That is, sort the 5 data files by SEQN, merge the datasets by SEQN, take the first 100 or so records (or a middle 100 or the last 100, and print the line listing for all or some of the DXA variables. For participants with multiply imputed data, each of the 5 datasets will contain a different set of imputed values.

 

In doing your analysis, you will have to merge the DXA data with other NHANES datasets to get the demographic variables, sample design information, and covariates. For some data files, there may be some missing covariates. Make sure your sort.merge steps are merge all cases/sample persons (you don't want case-wise deletion based on missing covariates).

 

 

Question 5:

How do I compute standard deviations for DXA measures using the multiply imputed data to determine the prevalence of a disease? We would like to use the mean and standard deviations of various measures to create gender and gender/race specific cut points that will be used to define this disease.

 

Answer:

Take each of the five complete (i.e. measured and imputed values) datasets separately; calculate five standard deviations, one for each dataset, then take the average of the five.


Back to 1999-2004 DXA Data page

 

This page last reviewed February 28, 2008

H H S Health and Human Services logo and link
U.S. DEPARTMENT OF HEALTH AND HUMAN SERVICES
Centers for Disease Control and Prevention
National Center for Health Statistics
Hyattsville, MD
20782

1-800-232-4636