Census Bureau

MASKING MICRODATA FILES

Jay J. Kim and William E. Winkler

KEY WORDS: Confidentiality, Noise Inoculation, Reidentification, Swapping

ABSTRACT

Government agencies collect many types of data, but due to confidentiality restrictions, use of the microdata is often limited to sworn agents working on secure computer systems at those agencies. These restrictions can severely affect public policy decisions made at one agency that has access to nonconfidential summary statistics only. This necessitates creation of microdata which not only meets the confidentiality requirements but also has sufficient utility. This paper describes a general methodology for producing public-use data files that preserves confidentiality and allows many analytical uses. The methodology masks quantitative data using an additive-noise approach and then, when necessary, employs a reidentification/swapping methodology to assure confidentiality. One of the major advantages of this masking scheme is that it also allows obtaining precise subpopulation estimates, which is not possible with other known masking schemes. In addition, if controlled distortion is applied, then a prespecified subset of subpopulation estimates from the masked file could be nearly identical to those from the unmasked file. This paper provides the theoretical underpinning of the masking methodology and the results of its actual application using examples.