U.S. Department of Commerce

Research Reports

You are here: Census.govSubjects A to ZResearch Reports Sorted by Year › Abstract of RRS2010/02
Skip top of page navigation

General Discrete-data Modeling Methods for Producing Synthetic Data with Reduced Re-identification Risk that Preserve Analytic Properties

William E. Winkler

 

ABSTRACT

General modeling methods for representing and improving the quality of discrete data (Winkler 2003, 2008) extend and connect the editing methods of Fellegi and Holt (1976) and the imputation ideas of Little and Rubin (2002). This paper describes a modeling framework to produce synthetic microdata that better corresponds to external benchmark constraints on certain aggregates (such as margins) and on which certain cell probabilities are bounded both below and above to reduce re-identification risk. Rather than use linear constraints (Meng and Rubin 1993), the modeling methods use convex constraints (Winkler 1990, 1993) in an extended MCECM procedure. Although the produced microdata are not epsilon-private (Dwork 2006, Dwork and Yekhanin 2008), surrogate original microdata would be exceptionally difficult (or impossible) to construct using the standard lp programming procedures of epsilon-privacy.

CITATION: Winkler, William E. (2010). General Discrete-data Modeling Methods for Producing Synthetic Data with Reduced Re-identification Risk that Preserve Analytic Properties. Statistical Research Division Research Report Series (Statistics #2010-02). U.S. Census Bureau. Available online at <http://www.census.gov/srd/papers/pdf/rrs2010-02.pdf>.

Source: U.S. Census Bureau, Statistical Research Division

Published online: January 28, 2010
Last revised: January 26, 2010


[PDF] or PDF denotes a file in Adobe’s Portable Document Format. To view the file, you will need the Adobe® Reader® Off Site available free from Adobe.

This symbol Off Site indicates a link to a non-government web site. Our linking to these sites does not constitute an endorsement of any products, services or the information found on them. Once you link to another site you are subject to the policies of the new site.

Source: U.S. Census Bureau | Statistical Research Division | (301) 763-3215 (or chad.eric.russell@census.gov) |   Last Revised: October 08, 2010