Accessibility Skip to Top Navigation Skip to Main Content Home  |  Change Text Size  |  Contact IRS  |  About IRS  |  Site Map  |  Español  |  Help  
magnifying glass
Advanced Search   Search Tips

Enterprise Data Warehouse

 

Privacy Impact Assessment – Enterprise Data Warehouse (EDW)

Purpose of the System:  In September 2001, the IRS initiated the Enterprise Data Warehouse (EDW) Project.  The business goal of EDW is to provide integrated, reliable tax operations and internal management information to support the IRS’s evolving decisions analytics, performance measurement, and management information needs.  The EDW strategy is to incrementally build an enterprise wide data warehouse of detail data and data marts designed to meet predictable strategic and tactical information needs.  Due to the prolonged transition to modernization, the data warehouse and data marts will use a mixture of existing and modernized tax operations and internal management systems as sources of data.  EDW is owned by the IRS Research Analysis and Statistics (RAS) Business Unit.

The EDW data mart contains a combination of data elements from different taxpayer models.  Additional data is being included in this application from other Business Units (i.e. SBSE and W&I).  The taxpayer information is extracted from the existing IRS systems, files, and databases:

* Individual Master File (IMF) (includes Individual Return Transaction File (IRTF))
* Business Master File (BMF) (includes Business Return Transaction File (BRTF))
* Modernization E-File (MeF)
* Information Returns Master File Processing (IRMF)
* Customer Account Data Engine (CADE)
* Foreign Information System (FIS)

Users of EDW consist of the Taxpayer Advocate group and personnel with research responsibility in each IRS Business Unit.

Systems of Records Number(s): 

Treas/IRS 24.030 CADE Individual Master File
Treas/IRS 24.046 CADE Business Master File
Treas/IRS 42.021 Compliance Programs and Project Files
Treas/IRS 34.037 IRS Audit Trail and Security Records System

Data in the System

1. Describe the information (data elements and fields) available in the system in the following categories:
A. Taxpayer
B. Employee
C. Audit Trail Information (including employee log-in info)
D. Other (Describe)

A. Taxpayer:  EDW contains forms and filer return information on business and individual taxpayers.

Information on the Business Master File (BMF) (i.e., for business taxpayers) includes:
* income
* deductions
* asset and liability information
* dividends
* partner information
* compensation of officers

Information on the Individual Master File (IMF) (i.e., for individual taxpayers) includes:
* income
* deductions
* asset and liability
* tax and payments
* filing status
* exemptions

The forms contain the following key data elements, related to personal information, that are available in the system.
* Name (business or individual)
* Trade Name
* Address (business or individual)
* Street, street number, city, state, zip
* Employer Identification Number
* Social Security Number
* Principal Business Activity, Product or Service, Business Code Number
* Vehicle Identification Number
* Third Party Designee
* Personal Identification Number

Taxpayer information is available in the following forms (including all associated form schedules).
 
BMF and IMF Forms and Information:

Form
Description
1065
U.S. Return of Partnership is Income
1120
U.S. Corporation Income Tax Return
2290
Heavy Highway Vehicle Use Tax Return
720
Quarterly Federal Excise Tax Return
940
Employer’s Annual Federal Unemployment (FUTA) Tax Return
941
Employer’s Quarterly Federal Tax Return
943
Employer’s Annual Federal Tax Return for Agricultural Employees
945
Annual Return of Withheld Federal Income Tax
1040
U.S. Individual Income Tax Return

B. Employee:  Users register through the OL5081 system for access to the EDW.  Hence, they are assigned unique user IDs and passwords. 

C. Audit Trail Information:  Audit records capture user queries but not the results as that would pose storage issues due to potentially high volumes of data.  If the query is captured, it can be rerun if needed to view the results obtained by the query submitter.  The following information is captured for the audit record:

* Operating system login user name
* User name
* Session identifier
* Terminal identifier
* Name of schema object accessed
* Operation performed or attempted
* Completion code of the operation
* Date and timestamp
 
Fine grain auditing to capture queries involving SSN information is also performed.  Any time a user query involves SSN information either as part of the query or as part of the information being returned in a response, a record of that is kept for auditing purposes.  If a query involves SSN information, the following is also captured for the audit record:

* Database user
* Policy name
* System Change Number (Scn)
* Standard Query Language (SQL) text
* SQL bind

D. Other:  None

2. Describe/identify which data elements are obtained from files, databases, individuals, or any other sources.
A. IRS
B. Taxpayer
C. Employee
D. Other Federal Agencies (List agency)
E. State and Local Agencies (List agency)
F. Other third party sources (Describe)

A. IRS:  The EDW contains a combination of data elements from different taxpayer models.  Additional data is being included in this application from other Business Units (i.e. SBSE and W&I).  The taxpayer information is extracted from the existing IRS systems, files, and databases:

* Individual Master File (IMF) (includes:Individual Return Transaction File (IRTF))
* Business Master File (BMF) (includes:Business Return Transaction File (BRTF))
* Modernization E-File (MeF)
* Information Returns Master File Processing (IRMF)
* Customer Account Data Engine (CADE)
* Foreign Information System (FIS)
B. Taxpayer:  EDW does not obtain data directly from taxpayers.  Refer to Question 1.A. for a list of key data elements and forms contained in the EDW system.
C. Employee:  None.  (All data elements come from IRS systems.)
D. Other Federal Agencies:  None
E. State and Local Agencies:  None
F. Other third party sources:  None

3. Is each data item required for the business purpose of the system?  Explain.

Yes.  The business goal of EDW is to provide integrated, reliable tax operations and internal management information to support the IRS’s evolving decisions analytics, performance measurement, and management information needs.  The EDW strategy is to incrementally build an enterprise wide data warehouse of detail data and data marts designed to meet predictable strategic and tactical information needs.  Due to the prolonged transition to modernization, the data warehouse and data marts will use a mixture of existing and modernized tax operations and internal management systems as sources of data to support the business purpose of the system and the IRS. 

The minimum amount of relevant and necessary information is captured in order to link the records that need to be related specifically for a research query.  The system prevents disclosure of personally identifiable information to its normal users through limited screen views and encryption.

In certain cases, users may have a specific business need to access personally identifiable information (PII) on a limited basis.  Those users must receive special authorization from management (i.e., via expanded special access via OL5081 approval) before being granted access to PII to perform their duties. 

4. How will each data item be verified for accuracy, timeliness, and completeness?

Annually, extracts of the various EDW data sources are created.  Then through SQL scripts, extracts are cleansed, transformed, and processed through the standard transmittal process.  These SQL scripts are executed against the production database under the production system user-ids . 

XRDB data is loaded weekly from an extract of MTRDB (MeF) and this data is validated for completeness by the load process scripts that check header records to ensure that the amount of data loaded matches the amount of data provided, i.e. row count checks.

5. Is there another source for the data?  Explain how that source is or is not used.

No. Data is not collected from another source beyond what has been stated previously in this PIA.

6. Generally, how will data be retrieved by the user? 

The user must enter their system identification and password to gain access to EDW and be able to submit a query to retrieve information from the system. 

7. Is the data retrievable by a personal identifier such as name, SSN, or other unique identifier? 

Yes.  Data is typically not obtained by a personal identifier as a normal course of data retrieval for analytics reporting.  Screen views restrict all identifying information within a record from the user. 

However, a record containing PII can be retrieved by a personal or unique identifier (e.g., SSN) in certain cases, and only if a special authorization from management is approved (i.e., via expanded special access via OL5081 approval).  The business need for this type of retrieval is therefore restricted to only those users obtaining the proper authorization who must research suspect problems with a specific account back to the system of record.  Unique identifiers, in such cases, would be used to carry out queries on the database. 

Access to the Data

8. Who will have access to the data in the system (Users, Managers, System Administrators, Developers, Others)?

Users of EDW consist of the Taxpayer Advocate group and personnel with research responsibility in each IRS Business Unit.  EDW database access is controlled through Oracle permissions.  All accounts are limited to read only access to the database.  The DBA is the only person with permissions to write to the database for loading EDW application data.

EDW has three modules that comprise the data warehouse:
* IFM (Individual Filers Model) – Users can query taxpayer data of individual filers.
* BFM (Business Filers Model) – Users can query taxpayer data of business filers.
* XRDB (XML Relational Database) – Users can query taxpayer data that was electronically filed.  The data is stored as XML.

This application does not allow access to the public.  Only authorized users are granted authorization to access this application and are provided with user IDs and passwords

Only authorized database administrators have greater than read-only access to EDW.  There are four read-only EDW roles:
* Developer – developers have access to the entire data warehouse
* LMSB – users of the LMSB Business Unit access tables related to their function
* SBSE – users of the SBSE Business Unit access tables related to their function 
* Research – users with research responsibility access specific research-related tables within EDW

At the time of this writing, there are about 100 EDW users across all the IRS Business Units.

9. How is access to the data by a user determined and by whom? 

All new users requesting access to an IRS system must do so through the OL5081 system.  Users are required to complete an OL5081, Information System User Registration/Change Request Form, which lists mandatory rules for users of IRS information and information systems.  When a user has been approved for access to the application by his/her manager, the OL5081 system sends an email to the user, providing an approval notification.  The user then logs into the OL5081 system, reads the Rules of Behavior, and provides an “electronic signature,” acknowledging that he/she has read, understands, and agrees to abide by the Rules of Behavior.

The EDW separates user access by role.  There are several roles within the EDW which are assigned different privileges for accessing different tables on the EDW database.  Roles are assigned based on the user’s business need to know.  Request for access to the EDW is made through the OL5081 system.  The EDW administrator informs users as to which EDW component to select in the OL5081 system.  The user’s manager approves the access request.  The EDW database administrator (DBA) reviews access requests when assigning user IDs and passwords.

Development of EDW is partially outsourced to contractors who work onsite alongside IRS developers as well as in IRS-designated offices and at their contractor facility.  The contractors only access IRS systems when using IRS-owned laptops at IRS facilities.  When using their own equipment or at their facility they do not have access to IRS systems.  The contractors have proper security clearance and are subject to the same organizational policies and procedures as IRS employees regarding access to the application and data.  For example, a contractor must have an approved OL5081 form to gain access to the EDW.  Outsourcing of the development of EDW was approved at the director level.

Contractors undergo National Agency Check and Credit Check (NACC) background investigations and are considered Moderate risk level.  Re-investigations are conducted every 5 years for high-risk, Top Secret and Contractor clearances.  

10. Do other IRS systems provide, receive, or share data in the system?  If YES, list the system(s) and describe which data is shared.  If NO, continue to Question 12.

Yes.  The EDW application does not directly connect to any applications or systems external to the IRS.  All EDW data is stored in the Oracle database on the database server at ECC-Detroit.  EDW creates its data from extracts of the following sources:

* Individual Master File (IMF) (includes:Individual Return Transaction File (IRTF))
* Business Master File (BMF) (includes:Business Return Transaction File (BRTF))
* Modernization E-File (MeF)
* Information Returns Master File Processing (IRMF)
* Customer Account Data Engine (CADE)
* Foreign Information System (FIS)

11. Have the IRS systems described in Item 10 received an approved Security Certification and Privacy Impact Assessment?

Certification and Accreditation (C&A): 

The following systems hold a current Certification and Accreditation in the Mission Assurance Master Inventory:
* IMF  (completed 8/20/2004; expires 9/9/06 )
o IRTF (a subsystem of IMF)
* BMF (completed: 8/20/2004; expires 9/10/06)
o BRTF (a subsystem of BMF)
* MeF (completed 7/9/04; expires 12/6/06)
* IRMF (completed 8/20/04; expires 9/9/06)
* CADE (completed 6/30/2004; expires 6/17/07)

The following systems do not have a current Certification and Accreditation in the Mission Assurance Master Inventory:
* FIS

Privacy Impact Assessment (PIA): 
The following systems hold a current Privacy Impact Assessment in the Office of Privacy Inventory:
* IMF  (approved 9/1/03; expires 8/31/06)
o IRTF (a subsystem of IMF)
* BMF (approved 8/22/03; expires 8/21/06)
o BRTF (a subsystem of BMF)
* MeF (approved 9/2/03; expires 9/1/06
* CADE (approved 12/21/05; expires 12/20/08)

The following systems do not have a current Privacy Impact Assessment in the Office of Privacy Inventory:
* IRMF
* FIS

12.  Will other agencies provide, receive, or share data in any form with this system?

No.  The EDW application does not directly connect to any applications or systems external to the IRS. 

Administrative Controls of Data

13.  What are the procedures for eliminating the data at the end of the retention period?

EDW data is retained on the system for 10 years and EDW audit records are backed up onto tapes that are stored off-site for seven years, satisfying the requirement in IRM 10.8.1.  At any time, these audit records can be retrieved to support after-the-fact investigations of security incidents.

EDW follows disk sanitization procedures for destruction of discarded media.  IRM 2.7.4, Management of Magnetic Media (Purging of SBU Data and Destruction of Computer Media) provides those procedures used for sanitizing electronic media for reuse (e.g., overwriting) and for controlled storage, handling, or destruction of spoiled media or media that cannot be effectively sanitized for reuse (e.g., degaussing).  The responsibilities for management and employees for the care, cleaning, rehabilitation, storage, shipment, receipt, inspection, repair, destruction and security of all magnetic media is addressed. 

14.  Will this system use technology in a new way?  If "YES" describe.  If "NO" go to Question 15. 

No.  This system does not use technology in a new way.

15.  Will this system be used to identify or locate individuals or groups?  If so, describe the business purpose for this capability.

Yes.  EDW can be used to identify or locate groups of the populace.  For example, a user may want to research areas of tax noncompliance in Ohio and may build a query that returns data suggesting a trend of noncompliance.  That research may then be used to build policy and procedures to help that segment of the population become more compliant in support of the IRS. 

However, EDW cannot be used to identify or locate individuals.

16. Will this system provide the capability to monitor individuals or groups? If yes, describe the business purpose for this capability and the controls established to prevent unauthorized monitoring.

No.  EDW is an analytical and reporting system.  It is not used to monitor individuals or groups.

17. Can use of the system allow IRS to treat taxpayers, employees, or others, differently?  Explain.

No.  EDW is an analytical and reporting system.  It cannot be used to treat individual taxpayers or employees disparately.

18.  Does the system ensure "due process" by allowing affected parties to respond to any negative determination, prior to final action?

No.  EDW is an analytical and reporting system and therefore, negative determinations are not rendered by the system itself meaning that due process rights would not be applicable for this system. 

19.  If the system is web-based, does it use persistent cookies or other tracking devices to identify web visitors?

EDW is not browser-based and therefore does not use persistent cookies or other persistent tracking devices.

 


Page Last Reviewed or Updated: September 15, 2006