Skip Navigation
small header image

Statistical Standards
Statistical Standards Program
 
Table of Contents
 
Introduction
1. Development of Concepts and Methods
2. Planning and Design of Surveys
3. Collection of Data
4. Processing and Editing of Data
5. Analysis of Data / Production of Estimates or Projections
6. Establishment of Review Procedures
7. Dissemination of Data
 
Glossary
Appendix A
Appendix B

 
·Measuring Bias
·Problems with Ignoring Item Nonresponse
·Imputing Item Nonresponse
·Data Analysis with Imputed Data
·Comparisons of Methods
·References
 

Appendix C
Appendix D
 
Publication information
For help viewing PDF files, please click here
APPENDIX B: EVALUATING THE IMPACT OF IMPUTATIONS FOR ITEM NONRESPONSE


Comparisons of Alternative Imputation Methods

There are a number of extant studies comparing alternative imputation methods. Two of them were conducted using NCES data, and a third involving a set of simulations was supported by NCES.

IEA Reading Literacy Study
One example using NCES data from the U.S. component of the IEA Reading Literacy Study, compared complete case (CC) analysis, available case (AC) analysis, hot-deck (HD) imputation, and the EM algorithm (EM) (Winglee, et. al., 1994). The first three methods were described above. The EM algorithm uses an iterative maximum likelihood procedure to provide estimates of the mean and variance-covariance matrix based on all available data for each respondent. The algorithm assumes the data are from a multivariate normal distribution, and that, conditional on the reported data, the missing data are missing at random. To conduct this comparison, regression equations were estimated using the four methods of imputation.

A linear regression model was used to predict a student's performance on a reading literacy test. The three reading scores used as the dependent variables were the narrative, expository, and document performance scores. These scores were derived using Item Response Theory models scaled for international comparison (Elley, 1992). The predictor variables used in all models were gender, age, race, father's and mother's education, family structure, family composition, family wealth/possessions, and use of a language other than English at home. The amount of missing data ranged from 0 to 18 percent with 31 percent missing data for one or more variables.

Unweighted ordinary least squares regressions were run using each of the four imputation methods for the three independent variables. For each independent variable, the regression coefficients estimated using the HD, EM, and AC methods were very similar. The estimates using the CC analysis method were dissimilar. This analysis also used adjusted mean scores to examine the performance of subgroups of students after controlling for other characteristics. The adjusted scores for a number of subgroups (e.g., gender, minority status, and parent's education) showed mean scores using CC that were approximately 10 points higher than the mean scores using HD, EM, and AC. These differences are presumably explained by the fact that the CC analysis excludes the 31 percent of the students who had missing data on one or more items.

This analysis was repeated for a comparison of CC, AC, and HD using weighted data. Although the use of the weights reduced the size of the gap somewhat, the differences persisted, with the CC analysis method yielding higher estimates than the AC and HD methods (which yielded similar results). The authors of this report concluded that the CC analysis method was clearly inefficient. Rather than the missing cases being randomly distributed, they found evidence that the students with missing data differed from those with complete data in reading performance, race/ethnicity, type of community, region of the country, and control of the school. They further concluded that given the similarity of results between the remaining three methods (AC, HD, and EM) since the HD method is the easiest to implement it is the best to use for the IEA study.

NELS:88
The second example from the analysis of NCES data uses data from the National Education Longitudinal Study of 1998 (NELS:88) to compare two imputations methods that were used for test scores ¾ within-class random hot-deck imputation and model based random imputation (Bokossa, Huang, and Cohen, 2000). The goal of this study was to select an imputation method to use to impute missing reading and math scores in the base year to second follow-up cohort. Sixty-five percent of the cohort took all four cognitive assessments in the three waves of the survey. The nonresponse rates by key demographic subgroups ranged from 20.5 to 27.5 percent, with the highest rates among minority students and low SES students, causing some concern over potential bias in the NELS estimates of academic performance.

The authors of this analysis first identified a set of auxiliary variables, and then using the subset of cases with complete cases they simulated different levels and patterns of missingness assuming about 20 percent missing data. Following the simulation, the incomplete data were compared with the imputed data using the average imputing error, the bias of the variance, and the mean bias. The average imputation error was found to be consistently lower in the model-based approach compared to the hot-deck approach.

Looking first at math, although a comparison of the bias of the mean across the two imputation methods and the incomplete data showed no consistent pattern; the means computed with the incomplete data were outperformed by one or both of the other two imputation methods in all but one comparison (i.e., the bias was smaller for one of the other two methods). The relative bias of the variance was consistently smaller in the model-based approach than it was in the other two approaches. The same results were observed in reading.

The authors concluded that the model-based approach was the "preferred method" and proceeded to use PROC IMPUTE to implement the imputations for the NELS data set.

Simulation Study
In an NCES sponsored simulation study, Hu, Salvucci, and Cohen, 2000, used 6 evaluation criteria to compare 11 imputation methods for 4 types of distributions, 5 types of missing mechanisms, and 4 types of missing rates. The imputation methods evaluated include: mean imputation, ratio imputation, sequential nearest neighbor hot deck imputation, overall random imputation, mean imputation with disturbance, ratio imputation with disturbance, approximate Bayesian bootstrap, Bayesian bootstrap, modeling non-ignorable missing mechanism (PROC IMPUTE), data augmentation (Schaefer's software), and adjusted data augmentation method.

The evaluation criteria used include: bias of parameter estimates, bias of variance estimates, coverage probability, confidence interval width, and average imputation error. They found that the results varied across different types of missing data; the five types considered are: missing completely at random (MCAR), tails more likely missing, large values more likely missing, center values more likely missing, tail values more likely missing with confounded (missingness in y depends on y itself).

In the case where large values are missing, ratio imputation (with or without disturbances), and data augmentation (Schafer) correct the bias in the mean; and within class random imputation and the sequential nearest neighbor hot-deck improved the biases substantially. However, the authors cautioned that the findings for ratio imputation may well be an artifact of their manipulation of the data. In summary, they note that although the improvement is much less when there is a right skewed distribution, in most cases these methods provide improvement when considerable biases exist in the means with the incomplete data.

In summarizing the results for variance estimation, the authors concluded that all imputation methods studies, except the mean imputation method, yield acceptable variance estimates when the data are missing completely at random. For the three unconfounded types of missing data-tails missing, large values missing, and center missing ¾ data augmentation (Schafer) worked best, but ratio imputation, within class random imputation, and the sequential nearest neighbor hot-deck method all can improve the biases of variance estimates dramatically. (However, there is a caution that the ratio imputation method tends to overestimate the variance.) For the confounded missing data pattern, where the missingness is related to the variable itself, only the ratio imputation methods (with and without disturbances) results in a substantial improvement in the bias of the variance.

When coverage rates and confidence interval widths are considered together, data augmentation (Schafer) and adjusted data augmentation are the least likely to provide bad estimates. Finally, when average imputation error is considered, ratio imputation, data augmentation (Schafer), and within class random imputation perform best, followed by hot-deck, ratio with disturbance, and mean imputation methods.

Looking across the entire set of results, data augmentation (Schafer) is the one imputation method that scores high on all accounts. Two other methods that are more commonly used at NCES ¾ within class random imputation (PROC IMPUTE) and the sequential nearest neighbor hot-deck method-also performed well in estimating means and variances and perform reasonably well on coverage rates and average imputation error (although within class random imputation (PROC IMPUTE) usually edges out the hot-deck method).


Continued ...


1990 K Street, NW
Washington, DC 20006, USA
Phone: (202) 502-7300 (map)