www.census.gov

http://www.census.gov/sipp/index.html SIPP Main Page Introduction to SIPP SIPP Survey Content Technical Information Using and Linking Files SIPP Publications Access SIPP Data SIPP Users' Guide SIPP Tutorial User Notes, ListServe, News SIPP Help

SIPP Home > SIPP Synthetic Beta Data Product


SIPP Synthetic Beta Data Product

blue rule

Some of the following documents are in the Portable Document Format (PDF). In order to view these files, you will need the Adobe(R) Acrobat(R) Reader which is available for free from the Adobe web site.

blue rule

Background on the SIPP Synthetic Beta

The SIPP Synthetic Beta (SSB) is an experimental data product produced by the U.S. Census Bureau in collaboration with the Social Security Administration (SSA) and the Internal Revenue Service (IRS). The purpose of the SSB is to provide users with access to a linked survey and administrative data product that can be used outside of a secure Census facility. The Census Bureau created the SSB by standardizing a basic set of variables across seven panels of the Survey of Income and Program Participation (SIPP) and then merging administrative earnings and benefits records from the IRS and SSA to these SIPP records. Census then multiply-imputed missing data, synthesized all but a few variables, and tested for disclosure risk. With the approval of the Census Disclosure Review Board, and their counterparts at IRS and SSA, Census is now releasing these data for public use.

For more information on the actual variables contained in this file, please see “Codebook for the SIPP Synthetic Beta.”

For more information on general synthesis methods, please see “Final Report to the Social Security Administration on the SIPP/SSA/IRS Public Use File Project.” This report was prepared to document the processes used in the creation of version 4 of the SSB and was posted on this website when version 4.1 was released. While some of the models used for specific variables have changed in version 5, our general methods remain the same and hence this document still provides relevant and useful information for users interested in understanding more about how the SSB was made.

For more information on the disclosure risk assessment, please see “DRB Memo September 20, 2010.”

Analytic Validity of the SSB: Disclaimer

The data synthesis process employed by Census to protect the linked data from disclosing the identity of individuals is relatively new and substantially changes both the survey and administrative data. The intent of the modeling done as part of the synthesis is to preserve relationships among variables that are of interest to researchers in the context of ensuring that personally identifiable information is not revealed to the data user. It has not been feasible to ensure accuracy by comparing every relationship among SSB variables with the corresponding relationship in the underlying confidential micro-data. Hence we strongly urge researchers not to publish results using the SSB without first requesting that Census validate these results with confidential data housed in a secure environment at the Census Bureau. Census will perform this validation free of charge to researchers, as resources permit and according to the protocol established by the three agencies involved and outlined below.

Without validation of results, Census, SSA, and IRS make no guarantee of the validity of the SSB for any research purpose.

How to Access the SSB

There are two methods for accessing the file. First, users may apply for a free account on the Synthetic Data Server (SDS) housed at the virtual RDC at Cornell University. Using this account, researchers will have access to SSB data and SAS and Stata software. Users may run programs on the server and results may be taken off without review by Census staff.

To apply for an account on the Synthetic Data Server (SDS), please submit “Application to use the SIPP Synthetic Beta File” to sehsd.synthetic.data.use.list@census.gov.

The other alternative is to download the data directly from the SIPP ftp page (coming soon) or Data Ferrett, in the same way as traditional SIPP public-use products. However Census will not validate programs that users have run only in their personal computing environments. In order to obtain results from the confidential data, users must first run their programs using the SDS (see Protocol for Validation of Results).

Protocol for Validation of Results

Census will validate results obtained from the SSB on the internal, confidential version of these data. Users who wish to obtain validated results should follow the protocol outlined here.

Further Questions

For further information about the SIPP Synthetic Beta, please email sehsd.synthetic.data.use.list@census.gov.


Introduction to SIPP |  Survey Design & Content |  Data Editing |  Finding SIPP Info |  Sampling & Weighting | 
Linking Files | Publications |  S&A  |  News&Notes |  Users' Guide |  Search  | 


Census 2000  |  Subjects A to Z  |  Search  |  Product Catalog  |  Data Access Tools  |  FOIA  |  Privacy · Policies  |  Contact Us  |  Home
separator rule
U.S. Census Bureau: Helping You Make Informed Decisions