US Census Bureau - Research Reports - ABSTRACT : Using Uncertainty Intervals to Analyze Confidentiality Rules for Magnitude dDta in Tables

U.S. Department of Commerce

You are here: Census.gov › Subjects A to Z › Research Reports Sorted by Year › Abstract of RRS2006/04

Using Uncertainty Intervals to Analyze Confidentiality Rules for Magnitude Data in Tables

Paul B. Massell

KEY WORDS: confidentiality, disclosure protection, p% rule, midpoint attack, uncertainty interval, uncertainty model, knowledge model

ABSTRACT

Protecting the confidentiality of survey respondent data is related to the notion of data user uncertainty in various ways. The source of uncertainty that is most frequently exploited by agencies in formulating protection rules for tabular data is the fact that there is often more than one respondent (e.g., a company) contributing to a given table cell value. Agencies are required to protect these individual contributions. The uncertainty in a data user’s mind about how the published cell value is distributed among the contributions is often sufficient to protect them. This “cell value distributional uncertainty” may be the most exploited source of uncertainty, but it is by no means the only one. Data user uncertainty about respondent contributions is created through many of the procedures involved in the design of a survey and in processing the collected data. It is usually possible to express a given data user’s uncertainty about a particular respondent’s contribution to a particular cell as a finite interval. The interval may be derived from inequalities associated with the table’s additivity or it may be based on “knowledge models” that describe, for example, the data user’s prior (approximate) knowledge of respondent contributions or sampling weights. We call such intervals “uncertainty intervals”. Sometimes the knowledge models may allow a probability distribution to be defined on the uncertainty interval. The major thesis of this paper is that uncertainty intervals can be used as a means of unifying the description of many of these sources of uncertainty. We show how uncertainty intervals can unify the description of several formulas and algorithms that are frequently used during the process of protecting data, e.g., those related to the p% rule, sliding and two-sided protection, cell value rounding, and weights applied to the underlying microdata. In future work, the author hopes to extend this approach to additional sources of uncertainty.

CITATION:

Source: U.S. Census Bureau, Statistical Research Division

Created: March 21, 2006
Last revised: March 21, 2006

[PDF] or denotes a file in Adobe’s Portable Document Format. To view the file, you will need the Adobe® Reader® available free from Adobe.

This symbol indicates a link to a non-government web site. Our linking to these sites does not constitute an endorsement of any products, services or the information found on them. Once you link to another site you are subject to the policies of the new site.

Skip footer section