U.S. Department of Commerce

Research Reports

You are here: Census.govSubjects A to ZResearch Reports Sorted by Year › Abstract of RRS2006/04
Skip top of page navigation

Using Uncertainty Intervals to Analyze Confidentiality Rules for Magnitude Data in Tables

Paul B. Massell

KEY WORDS: confidentiality, disclosure protection, p% rule, midpoint attack, uncertainty interval, uncertainty model, knowledge model

ABSTRACT

Protecting the confidentiality of survey respondent data is related to the notion of data user uncertainty in various ways. The source of uncertainty that is most frequently exploited by agencies in formulating protection rules for tabular data is the fact that there is often more than one respondent (e.g., a company) contributing to a given table cell value. Agencies are required to protect these individual contributions. The uncertainty in a data user’s mind about how the published cell value is distributed among the contributions is often sufficient to protect them. This “cell value distributional uncertainty” may be the most exploited source of uncertainty, but it is by no means the only one. Data user uncertainty about respondent contributions is created through many of the procedures involved in the design of a survey and in processing the collected data. It is usually possible to express a given data user’s uncertainty about a particular respondent’s contribution to a particular cell as a finite interval. The interval may be derived from inequalities associated with the table’s additivity or it may be based on “knowledge models” that describe, for example, the data user’s prior (approximate) knowledge of respondent contributions or sampling weights. We call such intervals “uncertainty intervals”. Sometimes the knowledge models may allow a probability distribution to be defined on the uncertainty interval. The major thesis of this paper is that uncertainty intervals can be used as a means of unifying the description of many of these sources of uncertainty. We show how uncertainty intervals can unify the description of several formulas and algorithms that are frequently used during the process of protecting data, e.g., those related to the p% rule, sliding and two-sided protection, cell value rounding, and weights applied to the underlying microdata. In future work, the author hopes to extend this approach to additional sources of uncertainty.

CITATION:

Source: U.S. Census Bureau, Statistical Research Division

Created: March 21, 2006
Last revised: March 21, 2006


[PDF] or PDF denotes a file in Adobe’s Portable Document Format. To view the file, you will need the Adobe® Reader® Off Site available free from Adobe.

This symbol Off Site indicates a link to a non-government web site. Our linking to these sites does not constitute an endorsement of any products, services or the information found on them. Once you link to another site you are subject to the policies of the new site.

Source: U.S. Census Bureau | Statistical Research Division | (301) 763-3215 (or chad.eric.russell@census.gov) |   Last Revised: October 08, 2010