Accessibility Skip to Top Navigation Skip to Main Content Home  |  Change Text Size  |  Contact IRS  |  About IRS  |  Site Map  |  Español  |  Help  
magnifying glass
Advanced Search   Search Tips

Information Quality Guidelines - Guidance: Statistical Information

 

The modern U.S. income tax was enacted in 1913 with the passage of the sixteenth amendment to the U.S. Constitution. Subsequently, the Revenue Act of 1916 required the annual publication of statistics. Despite many revisions to the tax law, the original requirement of that Act continues today. Specifically, the current Internal Revenue Code states "The Secretary (of the Treasury) shall prepare and publish not less than annually statistics reasonably available with respect to the operations of the internal revenue laws..."

The IRS operates a data collection and dissemination system, called the Statistics of Income (SOI) program, which uses the Federal tax system as a comprehensive source of economic and financial information.  The program collects data from different tax and information returns that are processed in administrating the tax laws.  During its nearly 90-year history, the main emphasis of the SOI program has been individual and corporation income tax data.  Other subjects based on other types of returns for which data are currently collected, either annually or periodically, include partnerships, private foundations and other exempt organizations.

The data gathered by the SOI program is used extensively for tax research and for estimating revenue by tax analysts in the Department of the Treasury’s Office of Tax Analysis (OTA) and the Congressional Joint Committee on Taxation (JCT).  The third major user is the Department of Commerce’s Bureau of Economic Analysis (BEA) which relies on SOI data extensively for estimating components in the National Income and Product Accounts.  Many other Federal agencies are also users of SOI data, including the Federal Reserve Board, the General Accounting Office, the Social Security Administration, and the Health Care Financing Administration.  Outside of the Federal Government, SOI data are used by a broad array of tax practitioners, policy researchers, demographers, economic analysts, consultants, business and trade associations, corporate tax departments, State and local Governments, foreign Governments, universities, public libraries, and the media, as well as the public at large. In addition, other areas of IRS use SOI data for their internal operations.

Utility

IRS will make information products widely available and broadly accessible.

Since tax returns are protected from public scrutiny by law, strict procedures govern the handling of returns and computer files containing such information. SOI's primary customers (OTA and JCT) are authorized to receive detailed tax return (microdata) files, so computer files of tax return information are regularly provided to them. However, most other users of SOI data can only have access to summary tabulations. The purposes and details describing such access are specified in Section 6103 of the Internal Revenue Code.

SOI information is made publicly available through both printed publications and electronic media. The Statistics of Income (SOI) Bulletin is published quarterly, with each issue containing four to eight articles and data releases of recently completed studies, as well as historical tables covering a variety of subject matter, from Treasury Department tax collections to taxpayer assistance and tax return projections. IRS produces separate annual "complete reports" on individual and corporation income tax returns, which contain more comprehensive data than what are published in the Bulletin.  The Corporation Source Book is also published annually, presenting detailed income statement, balance sheet, and tax data by industry and asset size.  Periodically, IRS produces special compendiums of research and analysis, covering topics such as nonprofit organizations, estate taxation and personal wealth, and international business activities.  Research articles documenting technological and methodological changes in SOI programs and other related statistical uses of administrative records are also published in a series of reports. SOI is also responsible for releasing other IRS information, including the Internal Revenue Service Data Book (containing tax collections and other tax administration data), tax return projections, and microdata records of exempt organizations.

SOI data on individuals, corporations, and other entities are available to the public on the IRS World Wide Web site.  Electronic media products available from SOI also include magnetic tapes, CD-ROM's, diskettes, and files sent via e-mail.  These products include the Individual Public-Use Microdata File (for which taxpayer identifiers have been removed); Exempt Organizations and Private Foundations Microdata Files (whose returns are open to the public); the Corporation Source Book; individual income tax return data shown by State, county, or ZIP code; and individual migration data shown on either a State or county basis. SOI also has a Statistical Information Services (SIS) office to facilitate the dissemination of SOI data.
 
IRS will keep informed of information needs through active and ongoing contact with the user community and will provide vehicles for user input into our information programs.

IRS keeps abreast of information needs through a variety of means, including meeting with our customers, conducting user surveys, working with advisory committees, and convening and attending conferences.  Contact information is available, where appropriate, on a variety of information products to allow for questions, comments, and suggestions from users.

IRS statistical publications and other information products will be reviewed to ensure that they remain relevant and timely and that they address current information needs.

On the basis of internal product reviews and consultation with users, and in response to changing needs and emphases, the content of ongoing information products is changed, new products are introduced, and some products are discontinued.

Objectivity

Objectivity, as defined in the OMB quality guidelines (V.3.B), involves a focus on ensuring that information is accurate, reliable, and unbiased and that information products are presented in a clear, complete, and unbiased, well-documented manner.  Objectivity is achieved by using reliable data sources and sound statistical techniques, by having information products prepared by qualified people using proven methods, and by carefully reviewing the content of all information products.


Information products disseminated by IRS will be based on reliable, accurate data that have been validated.

All of the information disseminated by IRS is based on reported data on tax or information returns submitted by taxpayers.  SOI data are collected only from non-amended and non-audit returns.  The United States uses a system of self-assessment for the collection of most federal taxes.  Under this system, taxpayers, whether individuals or corporations, compile the facts about their income, claim allowable deductions, and calculate taxes they must pay, using forms provided for this purpose and filing all documents by the date determine by tax laws. The facts are later checked against reports of payers and recipients.  Selected data items are computer entered for administrative purposes, to determine the correct tax liability.  Every week, as data are posted, a probability sample from tax returns is selected for each SOI study.  These sampled returns are subject to additional data abstraction for SOI by specially trained technicians.  The data thus extracted from the sampled returns are tested for consistency and any errors detected are then resolved.  Due to substantial penalties for misreporting, the income and expenditure data reported on tax returns have proven to be more reliable than comparable survey data.

IRS samples will be conducted using methodologies that are consistent with generally accepted professional standards for all aspects of sampling design and implementation.

IRS employs and documents accepted professional standards and practices for all its SOI studies, including sample frame development, sample design, testing sample selection computer programs, data editing, analysis of sampling and coverage errors, imputation of missing data, weighting, and variance estimation.

Tax returns are filed and administratively processed at one of ten IRS regional sites, called "service centers." Once processed, IRS compiles selected information from most return forms into a computerized "master file" system, which is the informational backbone of the agency. Most SOI operations begin by sampling returns from the master file system; the master file offers a sampling frame that enables use of sophisticated and efficient sample designs.

Statistics compiled for the SOI studies are generally based on stratified Bernoulli samples of tax or information returns. As returns are processed into the master file system, they are assigned to sampling classes (strata), based on criteria such as size of income or assets (or other measures of economic size), industrial activity, accounting period, or the presence of certain supplemental forms or schedules.  Each taxpayer, whether an individual or a business, has a unique number--- the social security number (SSN) for individuals or the employer identification number (EIN) for businesses. These unique taxpayer identification numbers are used as the seed to generate a pseudo-random number which, along with the sampling strata, determines whether a given return is to be selected for the SOI sample. The probability of a return being designated for the SOI sample depends on the sampling rate prescribed for its sample class or stratum and may range from a fraction of 1 percent to 100 percent.

The samples are selected from each stratum over the appropriate filing periods. Thus, sample selection can continue for a given study for several calendar years because of the prevalence of fiscal (non-calendar) year reporting.

All data employed in the preparation of information products will be compiled using acceptable procedures implemented by qualified professional staff.

After sampling, the relatively few data items pulled electronically from the master file system are substantially augmented with additional items key-entered from hardcopies of taxpayers' returns. Statistical abstracting can take as little as a few minutes for a simple return, to as long as several days for a large corporate return.

IRS has built a network of mid-range servers in selected service centers that are dedicated to SOI statistical processing. "Hub" sites are located in Ogden (Utah) and Covington (Kentucky), with other processing centers located in Atlanta (Georgia), Austin (Texas), and Kansas City (Missouri). The processing system uses on-line transaction processing, so that all data capture operations are completed in a single pass. One editor is responsible for ensuring the validity of all data processing for a given return.

Several extensive quality review processes are used to ensure the quality of the data.  The review processes begin at the sample selection stage with weekly monitoring of the sample to ensure that the proper number of returns was being selected.  They continue through the data collection, data cleaning, and data completion procedures with consistency testing.  Part of the review process includes extensive comparisons between the final statistics and previous year statistics as well as external data.  Great amounts of effort are made at every stage of processing by qualified professional staff to ensure data integrity.

Due to substantial penalties for misreporting, the income and expenditure data reported on tax returns have proven to be more reliable than comparable survey data. Even so, IRS employees go to great lengths to protect against nonsampling errors, such as those due to taxpayer reporting variations or inconsistencies, or data processing errors. In order that final statistics are consistent and reliable, IRS economists develop extensive on-line tests and error resolution procedures that are applied to each sampled return. The tests and correction procedures are based on the structure of the tax laws and forms, generally accepted accounting principles, and the improbability of various data combinations.

Editors in service centers and IRS economists statistically edit data items in order to make each sampled return internally consistent. Missing data problems arise, albeit infrequently (under 1 percent of the time). Missing items can be obtained through direct contact with taxpayers, or be estimated through imputations based on other return data, prior-year data for the same taxpayer, or same-year data from a "statistically similar" return. IRS economists serve as subject matter experts to the editors in answering their questions, and also resolve errors in the more difficult cases.

Subsamples of returns are independently reprocessed and analyzed for a quality evaluation. Additionally, in order to provide high quality statistics, economists conduct on-line review trips to the processing centers and review quality logs written by editors.

All estimation procedures will be prepared using statistically sound procedures designed by qualified professional staff.

On the whole, the IRS approach to making statistical summaries, using design-based inferences for the calculation of estimates and their standard errors, is quite straight- forward.  In applications, the probability with which a return is selected for an SOI sample depends on the sampling rate prescribed for the stratum in which it is classified.  Weights are computed by dividing the population count of returns filed for a given stratum by the count of sample returns for the same stratum after adjusting for outliers and missing returns.

In some studies, it is possible to improve the estimates by employing post-strata, based on supplemental criteria or refinements of those used in the original stratification. Weights are then computed for these post-strata using additional population counts – oftentimes with fairly computer-intensive methods, such as raking ratio estimation.

Model-assisted estimates and bootstrapping techniques have been explored for selected SOI programs, but their deployment remains infrequent.  A combination of randomi-zation weighting and model-assisted techniques is now used to make preliminary estimates prior to the completion of sampling.

Data sources, sampling errors, and disclosure limitation methods will be documented in publications, either for the publication as a whole or for individual tables. 

Documentation in SOI publications contains information on data sources including definitions and specifications of variables.  Report documentation also includes, where appropriate, information on sampling errors and data limitation, as well as a description of rules or techniques for avoiding disclosure of confidential information.

All information products will be edited and proofread before release to ensure clarity and coherence of the final report.

Text is edited to ensure that the report is easy to read and grammatically correct, thoughts and arguments flow logically, and information is worded concisely and lucidly. Tables and charts are edited to ensure that they clearly and accurately illustrate and support points made in the text and include concise but descriptive titles. Tables and charts indicate the unit of measure and the universe being examined, and all internal labels (column headings, row stubs, and panel headings) accurately describe the information they contain. All changes made to a manuscript during the editing process are checked by a proofreader, reviewed and approved by the author.

A comprehensive errata policy will inform users of both printed and Web-based publications when an error has been found and corrected.

If an error is detected before an initial mailing, IRS includes an errata notice with the mailing.  If the mailing has been sent out, an errata sheet is issued with all subsequent publications that are disseminated and, where appropriate, the errata sheet is sent to all those who received the initial mailing.  Errata notices are placed on the first page of the Web version to inform both new and repeat site visitors about the mistake, and the corrected version of the document is posted on the Web. 

Integrity

Integrity, as defined in the OMB quality guidelines, refers to the security of information from unauthorized access or revision to ensure that the information is not compromised through corruption or falsification.

To ensure the integrity of its information, IRS will employ rigorous controls that have been identified as representing sound security practices.

Tax returns are protected from public scrutiny by law, and strict procedures govern the handling of returns and computer files containing such information.  IRS has programs and policies in place for securing its resources as required by the Internal Revenue Code.  In order to prevent disclosure of information about specific taxpayers or businesses in SOI published tables, a weighted frequency (and the associated amount, where applicable) of less than 3 is either combined with data in an adjacent cell(s) so as to meet the criteria, or deleted altogether.  Similar steps are taken to prevent indirect disclosure through subtraction.  However, combined or deleted data are included in the appropriate totals. Most data on tax-exempt, nonprofit organization are excluded from disclosure review because the Internal Revenue Code and regulations permit public access to most of the information reported by these organizations.

Transparency and Reproducibility

For the purpose of these guidelines, transparency refers to a clear description of the methods, data sources, assumptions, outcomes, and related information that will allow a data user to understand how an information product was designed and produced. SOI guidelines call for clear documentation of data and methods used in producing estimates and projections. Their implementation will ensure the transparency of our disseminated information.

Reproducibility of information refers to the ability, in principle, for a qualified individual to use the documentation of methods, assumptions, and data sources to achieve comparable findings subject to an acceptable degree of imprecision.  In practice, opportunities for direct reproducibility are often limited by restrictions on access to confidential information.   Nevertheless, the procedures described above for data selection, preparation and dissemination provide sufficiently robust checks at all stages of the process to insure the highest quality information products possible.

 


Page Last Reviewed or Updated: December 21, 2007