Accessibility Skip to Top Navigation Skip to Main Content Home  |  Change Text Size  |  Contact IRS  |  About IRS  |  Site Map  |  Español  |  Help  
magnifying glass
Advanced Search   Search Tips

Electronic Data Warehouse (EDW)

 

Privacy Impact Assessment - Electronic Data Warehouse (EDW)

EDW System Overview

In April 1997, the IRS initiated Phase 0 FRR, a project designed to address the need for custodial financial management information related to taxpayer account data.  The FRR Project completes a business concept of operations document and business requirements document in December 1997, as well as an architecture document and a logical data model in June 1998.  RGLS, which was the first of four builds of Phase 0 FRR, provided a standard general ledger system to receive and summarized taxpayer account information and produce standard Federal financial management reports.  Implemented in November 1998, RGLS uses Federal Financial System, commercial off-the-shelf software by American Management Systems.

In October 1998, the IRS initiated PIDB, a project designed to address the need for custodial financial management information related to tax deposit and taxpayer payment data.  The PIDB Project is completed a business concept of operations document, business requirements document, and logical data model document in December 1999.

After a series of meetings with the Department of the Treasury during the fall of 1999, the IRS CIO, the IRS Chief, Management and Finance, and the IRS Acting CFO directed the projects known as Phase 0 FRR and PIDB to form a single Modernization project.  The new project- name CAP- was directed in November 1999 to do the following:

* Use a data warehousing approach for storing, analyzing, and reporting taxpayer accounts and collections information
* Architect a solution that would serve as the foundation for and EDW
* Transition from the IRS?s earlier Systems Life Cycle to the current ELC.

In August 2000, CAP completed the integration/development of a business concept of operations document, business requirements document, system design/ architecture documents, logical data model document, and project security testing plans? documents for development, testing, and deployment.  Phase 0 FRR functions were integrated into two CAP TASL builds and PIDB functions were incorporated into two CAP CSL builds.  In November 2000, the IRS began the development of the first of these four builds. 

In September 2001, the IRS initiated the EDW Project.  The business goal of EDW is to provide integrated, reliable tax operations and internal management information to support the IRS?s evolving decisions analytics, performance measurement, and management information needs.   The EDW Project strategy is to incrementally build an enterprise wide data warehouse of detail data. And data marts designed to meet predictable strategic and tactical information needs.  Due to the prolonged transition to modernization, the data warehouse and data marts will use a mixture of existing and modernized tax operations and internal management systems as sources of data.

Data in the System

1. Generally describe the information to be used in each of the following categories:

a.  Taxpayer

The system will use taxpayer account, taxpayer return, and case data.
b.  Employee
The system will use employee type, skills, classification number, and general status data.
c.  Other (Specify)
The system will use Internal Management budget, cost, travel, procurement, and asset data.

2. What are the sources of the information in the system?

a. What IRS files and databases are used?


The taxpayer information is extracted from the existing IRS systems, files, and databases:

* Individual Master File (IMF)
* Individual Return Transaction File (IRTF)
* Business Master File (BMF)
* Business Return Transaction File (BRTF)
* Treasury Financial Government Service (FMS) Government Online Accounting Link System (GOALS) II
* Tax Return Database (TRDB)
* Information Returns Master File Processing (IRMF)
* Payer Master File (PMF)
* Customer Account Data Engine (CADE)
* CFO Accounts Receivable Management System (CAM)
* Filing and Payment Compliance (FPC)
* HRConnect
* Integrated Financial System (IFS)

b. What Federal Agencies are providing data for use in the system?  What State and Local Agencies are providing data for use in this system?

Treasury Financial Management Services (FMS) provides disbursement schedule confirmation information through Government Online Accounting Link System (GOALS).

State and Local Agencies do not provide data for use in the system.

c.  What is the source of the date used in the system? 

The taxpayer information is extracted from the existing IRS systems, files, and databases:

* Individual Master File (IMF)
* Individual Return Transaction File (IRTF)
* Business Master File (BMF)
* Business Return Transaction File (BRTF)
* Treasury Financial Government Service (FMS) Government Online Accounting Link System (GOALS) II
* Tax Return Database (TRDB)
* Information Returns Master File Processing (IRMF)
* Payer Master File (PMF)
* Customer Account Data Engine (CADE)
* CFO Accounts Receivable Management System (CAM)
* Filing and Payment Compliance (FPC)
* HRConnect
* Integrated Financial System (IFS)

d. What other third-party sources will data be collected from?

Data is not collected from other third-party sources in EDW Releases 1-4.

e. What information will be collected from taxpayer/ employee?

EDW will collect data from existing IRS systems and databases (IMF, IRTF, BMF, BRTF, GOALS, TRBD, IRMF, PMF, CADE, CAM, FPC, RC, HRConnect, IFS).

EDW will use taxpayer account, taxpayer return, and case data from these existing IRS systems and databases.  From HRConnect, EDW will collect employee ID?s for audit log of SQL queries.  

3. Data collected from other sources:

a. How will data collected form sources other than IRS records and the   taxpayer be verified for accuracy?

FMS GOALS disbursement confirmation data is matched to IRS disbursement schedule data.  Unmatched records are flagged for research.

b. How will data be checked for completeness?

FMS GOALS disbursement confirmation data is matched to IRS disbursement schedule data.  Unmatched records are flagged for research.

c. Is the data current?  How do you know?

Yes.  The data is current.  Disbursement schedule confirmation is made available to the IRS on a daily basis.  Each disbursement schedule confirmation record contains data relevant to a range of disbursement requests.  This information is never updated after it is created.

4. Are the data elements described in detail and documented?  If yes, what is the name of the document?

* FMS disbursement confirmation file format is documented in Interface Control Document (ICD) with FMS.
* IMF data elements are documented in Reverse Engineering Database
* Data model entities and attributes are described in detail and documented in the EDW System Design (SYD) Report

4.2  Access to the Data

1.  Who will have access to the data in the system (Users, Managers, System Administrators, Developers, Others)?

* Business analysts/researchers and planning/operations managers
* Designated system and database administrators, analysts, and programmers performing system maintenance and support.

2.  How is access to the data by the user determined?  Are criteria, procedures, controls, and responsibilities regarding access documented?

The system complies with C-2  Level and Enterprise Architecture (EA) requirements.  Access to data is controlled by security profiles.  The following documents contain information regarding data access controls and procedures:

* EDW Security Features User Guide (SFUG)
* EDW Trusted Facilities Manual (TFM).

3.  Will users have access to all data on the system or will the user?s access be restricted?  Explain.

User Access is restricted to the minimum necessary to perform normal job duties.  Granting user access to data files, processing capabilities, etc., follows the practice of least privilege.

4. What controls are in place to prevent the misuse (e.g. browsing) of data by those having access?

EDW uses existing mainframe and UNIX system application and database access control and logging capabilities.  EDW user access is restricted to the minimum necessary to perform the job specific duties.  Granting permissions in EDW will be on a "need to know" basis.

The RDBMS table/view access control capabilities will be used to restrict user access to taxpayer/employee data.  Access to details of taxpayer transactions in EDW is restricted to authorized personnel using batch reporting capabilities.  Use of batch reporting capabilities and RDBMS table/view containing taxpayer identified data is logged to create an audit trail.

Also, existing UNAX policies, procedures, and practices continue to be in force for EDW.

5. Interface with other systems.

a. Do other systems share data or have access to data in this system?  If yes, explain.

No.  Other systems do not share or have access to data in this system.

b.  Who will be responsible for protecting the privacy rights of taxpayers and employees affected by the interface?

At this time, a System Owner for EDW has not been determined. 

6.  Interface with other systems:

a. Will other agencies share date or have access to data in this system (International, Federal, State and Local, Other)?

No.  External organizations will not have an automated interface that allows them to share data or have access to data in this system.  External organizations will receive outputs from the system form the business units following existing disclosure procedures/ processes.

b.  How will the data be used by the agency?

N/A.  EDW will not have an interface to other agencies that allow the sharing of data or access to data on the system.

c.  Who is responsible for ensuring proper use of the data?

N/A.  EDW will not have an interface to other agencies that allow the sharing of data or access to data on the system.

d. How will the system ensure that agencies get only information they are entitled to under IRC 6103?

N/A.  EDW will not have an interface to other agencies that allow the sharing of data or access to data on the system.

4.3  Attributes of the Data

1.  Is the use of the data both relevant and necessary to the purpose for which the system is being designed?

Yes.  Access to taxpayer and employee detail data is necessary to perform filing, payment, and reporting compliance, research planning, operations, research, financial audits, criminal investigation, and taxpayer advocacy.

2.  New Data:

a. Will the system derive new data or create previously unavailable data about an individual through aggregation from the information collected?

Yes.  EDW derives data based on information provided by the operational systems.

b. Will the new data be placed in the individual?s record (taxpayer or employee)?

No.  The derived data will not be recorded in the operational system.  It will exist only in the analytical system (EDW).

c. Can the system make determinations about taxpayers or employees that would not be possible without the new data?

Yes.  Both data mining tools and human analysts will be used to analyze taxpayers? activities and operations activities for trends and patterns.

d. How will the new data be verified for relevance and accuracy?

Data is derived based on data provided by operational systems that captured and validated the data.

3.  Consolidations:

b. If data is being consolidated, what controls are in place to protect the data from unauthorized access or use?

EDW uses existing mainframe and UNIX system application and database access control and logging capabilities. 

The RDBMS table/view access control capabilities will be used to restrict user access to taxpayer/employee data.  Access to details of taxpayer transactions in EDW is restricted to authorized personnel using batch-reporting capabilities.  Use of batch reporting capabilities and RDBMS table/views containing taxpayer identified data is logged to create an audit trail.

Also, the existing UNAX policies, procedures, and practices continue to be in force for EDW.

c. If processes are being consolidated, are the proper controls remaining in place to protect the data and prevent unauthorized access? Explain.

Processes are not being consolidated.

4.  How will the data be retrieved?  Can it be retrieved by personal identifier?  If yes, explain.

EDW will retrieve data from existing IRS systems, files, and databases.  The Relational Database Management System (RDBMS) table/view access control capabilities will be used to restrict user access to taxpayer/ employee data.  Access to details of taxpayer transactions in EDW is restricted to authorized personnel using batch reporting capabilities and RDBMS table/views containing taxpayer identified data is logged to create an audit trail.   

a. What are the potential effects on the due process rights of taxpayers and employees of:

i. Consolidation and linkage of the files and system:

There are no potential effects on due process rights of taxpayers and employees.  Business analysts/ researchers and operations managers will use EDW.  If the analysis research results in the need to contact taxpayers, the information will be passed to operations personnel who will initiate contact and record results using operations procedures to ensure due process. 

ii. Derivation of data:

There are no potential effects on due process rights of taxpayers and employees.  Business analysts/ researchers and operations managers will use EDW.  If the analysis research results in the need to contact taxpayers, the information will be passed to operations personnel who will initiate contact and record results using operations procedures to ensure due process.

iii. Accelerated information processing and decision making:

There are no potential effects on due process rights of taxpayers and employees.  Business analysts/ researchers and operations managers will use EDW.  If the analysis research results in the need to contact taxpayers, the information will be passed to operations personnel who will initiate contact and record results using operations procedures to ensure due process.
 
iv. Use of new technology:

There is no use of new technology.  The technology, databases, reporting, tools, and statistical applications are in use now by the IRS.

b. How are the effects to be mitigated?     
                   
There are no potential effects on due process rights of taxpayers and employees.  Business analysts/ researchers and operations managers will use EDW.  If the analysis research results in the need to contact taxpayers, the information will be passed to operations personnel who will initiate contact and record results using operations procedures to ensure due process.

4.4  Maintenance and Administrative Controls

1. Equitable treatment of taxpayers, groups, and employees:

a. Explain how the system and its use will ensure equitable treatment of taxpayers and employees?

EDW is an analytical and reporting system.  It is not used for taxpayer/ employee treatment. 

b.  If the system is operated in more than one site, how will consistent use of the system and data be maintained in all sites? 

The system uses centralized databases of tax administration and internal management data located at the MCC and a centralized database of general ledger data located at the DCC.  Users at all locations execute remote requests to access the system and data.  All user access is controlled using existing mainframe and UNIX policies and procedures.

c.  Explain any possibility of disparate treatment of individuals or groups?
         
EDW is an analytical system and not used for taxpayer/employee treatment.

2.  Retention Periods:

a.  What are the retention periods of data in this system?

EDW is not the operational system of record for taxpayer/ employee data.  Data retention is based on requirements for decision analysis and reporting.  Retention requirements are the current year plus the prior 3 years immediately accessible (minimum requirement for trend analysis) and 20 years offline (delayed access).  (SOI Requirement).

b.  What are the procedures for eliminating the data at the end of the retention period?  Where are the procedures documented?

The procedures for eliminating the data at the end of the retention period will be documented in the EDW Security Features User Guide (SFUG) which is in current development.

c.  While the data is retained in the system, what are the requirements for determining if the data is still sufficiently accurate, relevant, timely, and complete to ensure fairness in making determinations?

Detail data residing in EDW is not updated.  New data from operational systems is added during daily/weekly EDW processing.  

3.  Technology:

a.  Is the system using technology in ways that the IRS had not previously employed (i.e., Caller Id)?

No.  The IRS uses data warehousing decision analytics and report generation technology today.

b.  How does use of this technology affect taxpayer or employer  technology?

EDW is not used to contact taxpayers or for taxpayers to contact the IRS.

4.  Identifying, locating, monitoring:

a.  Will this system provide the capability to identify, locate, and monitor individuals?  If yes, explain.

EDW is not used to contact, locate, or monitor taxpayers.  Only taxpayer information required for decision analytics, agency performance measurement, and management, and management reporting is extracted from operational systems and stored in EDW. 

b.  Will this system provide the capability to identify, locate, and monitor groups of people?  If yes, explain.

EDW is not used to contact, locate, or monitor taxpayers.  Only taxpayer information required for decision analytics, agency performance measurement, and management, and management reporting is extracted from operational systems and stored in EDW. 

c.  What controls will be used to prevent unauthorized monitoring?

N/A.  EDW is not used to monitor an individual or groups of people.

5. Systems of Record Notice:

a.  Under which System of Records Notice (SORN) does the system operate?  Provide number and name.

EDW is being populated with data currently under the SORN for master files, TRBD, and CADE:

24.030 Treasury/IRS Individual CADE Master File 
24.046 Treasury/IRS Business CADE Master File
22.062 Treasury/IRS Electronic Filing Records   
34.037 Treasury/IRS IRS Audit Trail and Security Records System 

b. If the system is being modified, will the SORN require amendment or revision?  Explain.

EDW is a new system.

 


Page Last Reviewed or Updated: December 29, 2008