Census Bureau

VIEWS ON THE PRODUCTION AND USE OF CONFIDENTIAL MICRODATA

William E. Winkler

KEY WORDS:

ABSTRACT

To be of use to researchers, public-use microdata should be analytically valid and interesting. Public-use microdata are analytically valid if they yield results and conclusions that correspond closely to the results from the original, confidential microdata. Microdata are analytically interesting if files contain a sufficient number of variables, say five demographic and six quantitative, to produce more detailed inferences than could be produced using a large number of summary statistics. Many of the existing public-use files have been created by data providers who are unaware of modern record linkage techniques that allow some of their records to be reidentified and associated with individuals. Existing record linkage methods are so powerful that a small percentage of reidentifications is possible in some public-use files by relatively naive persons using commercially available software. Newly developing record linkage methods will initially allow the association of sets of individual records from sets of files for many important economic and demographic analyses that serve the public good and significantly reduce costs. These powerful new record linkage methods, when fully realized, will allow reidentifications in many of the public-use data files even though the data were produced by conscientious individuals who believed they were using effective techniques for assuring confidentiality.