NIH Enterprise Architecture Home

Data Consistency Pattern

Description

Modern enterprises generally have redundant versions of data regarding customers, products, orders, employees, and other entities; NIH is no exception. The most common method of reconciling data at NIH is to create a batch file in one application and then transfer it to other interested applications in a nightly, point-to-point batch process. More advanced tools such as the Integration Broker Suite (IBS) and database replication can be used to support complex integrations and provide higher performance.

There are four possible approaches to data consistency shown:

  • Extract Transform and Load (ETL) – An ETL tool is used to transfer data on a scheduled basis. ETL tools support complex data transformations and exception handling as well as relational databases and non-relational data stores as sources and targets.
  • Message-based – In this case, the IBS is used to achieve consistency across databases by triggering messages when data changes in the source database. The IBS provides functionality to facilitate this approach. Messages, in most case, will flow to and from applications or be processed through database adapters provided by the IBS.
  • Database Replication – Proprietary database functionality is used to replicate data across multiple data stores. This type of data transfer can be used to maintain consistency in real-time, but is often implemented as a batch process.
  • Batch File Transfer – A batch process is used to extract data from the source database. File transfer is used to move the file to the target system where it is parsed and used to update the target database.

The choice of approach will be dependent on the specific requirements of the systems being integrated. In general batch file transfer will be the least favored approach and should be used only for basic integrations.

Please view the Data Consistency Pattern below:

Diagram

Benefits

  • Allows data consistency to be maintained across multiple systems and is often less invasive than application level integration.

Limitations

  • Should avoid usage when dealing with applications outside of NIH; but may be possible in a few cases (e.g., DHHS, GSA).

Recommended Usage

This patterns is most applicable in cases where:

  • Reference data is being shared across multiple applications.
  • Data is being moved from operational/transactional systems into reporting systems.
  • Only data elements need to be shared. The history of business transactions changing the data is not required in the target system.

The best approach to data consistency will depend on the requirements of the systems being integrated.

  • ETL – The ETL-based approach to synchronization is preferred when batch update and aggregation is required as when moving from transactional systems to information/reporting systems.
  • Message-based – This approach should be used when near real-time update is required. When message-based integration is used source and target system remain decoupled. This is a performance and reliability advantage over real-time replication. When multiple applications are impacted by changes to data in one application, publish-subscribe capabilities can be used to support multiple targets. The message from the message-based data consistency approach may be accumulated and placed in a file by the IBS creating a hybrid pattern that combines message-based and batch file transfer data consistency approaches.
  • Data Replication – Data replication or direct database links may be used when real-time, transactional update of all data stores is needed. If real-time replication is used, it will add overhead to the source system application and introduces tight dependencies between the source and target system which may not be desirable.
  • Batch File Transfer – This approach for data consistency is best avoided except in cases where other approaches are not viable because limitations associated with the source or target system, or if the integration is extremely simple and requires little processing of the data and little or no error handling.

Time Table

This architecture definition approved on: May 24, 2006

The next review is scheduled in: TBD