Abstract
Scott H. Holan, Daniell Toth, Marco A. R. Ferreira and Alan F. Karr (2008)
"Bayesian Multiscale Multiple Imputation with Implications to Data Confidentiality "
Many scientific, sociological and economic applications present data that are collected on multiple scales of resolution. One particular form of multiscale data arises when data are
aggregated across different scales both longitudinally and by economic sector. Frequently, such data sets experience missing observations in a manner that they can be accurately imputed
using the method we propose known as Bayesian multiscale multiple imputation. This method borrows information both longitudinally and across different levels of aggregation to
produce accurate imputations of missing observations as well as estimates that respect the constraints imposed by the multiscale nature of the data. Our approach couples dynamic
linear models with a novel imputation step based on singular normal distribution theory. Although our method is of independent interest, one important implication of such methodology
is its potential effect on confidential databases protected by means of cell suppression. In order to demonstrate the proposed methodology and to assess the effectiveness of disclosure
practices in longitudinal databases, we conduct a large scale empirical study using the U.S. Bureau of Labor Statistics Quarterly Census of Employment and Wages (QCEW).
During the course of our empirical investigation it is determined that several of the predicted cells are within 1 percent accuracy, thus causing potential concerns for data confidentiality.
Last Modified Date: February 5, 2009
|