|
Confidentiality Seminar Series
September 2003 Seminar
NOTE: If you wish to attend, please follow the attendance
instructions. Direct any questions to Neil Russell (neil.russell@bts.gov).
Date/Time: Wednesday, September 17, 2003, 11:00 am - 12:00 pm
Location: U.S. Department of Transportation, Nassif Building, 400 7th
St., SW, Room 8240
Title: Preserving Quality and Confidentiality of Tabular Data
Presenter: Lawrence H. Cox, Associate Director, National Center for
Health Statistics (NCHS)
Slides: Lawrence H. Cox PDF (65KB) | HTML
(61KB)
Abstract: Standard methods for statistical disclosure limitation (SDL)
in tabular data either abbreviate, modify or suppress from publication the true
(original) values of tabular cells. All of these methods are based on satisfying
an analytical rule selected by the statistical office to distinguish cells and
cell combinations exhibiting unacceptable risk of disclosure (the sensitive
cells) from those that do not. The impact of these SDL methods on data analytic
outcomes is not well-studied but can be shown to be subtle or severe in particular
cases. Dandekar and Cox (2002) introduced a method for tabular SDL called controlled
tabular adjustment (CTA). CTA replaces the value of each cell failing the analytical
rule by a safe value, viz., a value satisfying the rule, and then uses linear
programming to adjust the values of the nonsensitive cells to restore additivity
of detail to totals throughout the tabular system. The linear programming framework
allows adjustments to be selected so as to minimize any of a variety of linear
measures of overall distortion to the data, e.g., total of absolute adjustments,
total percent of absolute adjustments, etc. Cox and Dandekar (2003) provide
further techniques for preserving data quality. While worthwhile, none of these
techniques directly addresses the overarching issue: Will statistical analysis
of original and disclosure limited data sets yield comparable results? We provide
a mathematical programming framework and algorithms, introduced in Cox and Kelly
(2003), that begins to address this issue. Specifically, we demonstrate how
to preserve approximately mean values, variances and correlations when original
data are subjected to CTA, and how to ensure approximately intercept=zero, slope=one
simple linear regression between original and adjusted data.
|
|