STANDARD FEATURES
COOP/Service Continuity consists of the policies, procedures, and programs that allow DISA, in concert with partner personnel, to provide an effective level of assurance that workloads will continue to process in accordance with regulatory requirements and documented obligations in SLAs. The continuity-related IA controls from DoDI 8500.2 are listed below and are satisfied by the COOP program as overseen by DISA.
Control |
Mitigation Strategy |
Alternate Site Designation |
An alternate site is identified as a recovery location and has appropriate equipment and infrastructure to allow restoration of processing capability. |
Protection of Recovery Assets |
Procedures exist to ensure the physical and technical protection of recovery infrastructure. |
Data Backup Procedures |
Data is backed up according to the required frequency and stored off-site. |
Disaster and Recovery Planning |
Plans and procedures exist to allow resumption within required time frames. |
Enclave Boundary Defense |
Measures in place are similar to production site protections. |
Scheduled Exercises and Drills |
Annual exercises are available upon partner request. |
Identification of Essential Functions |
Where identified by the partner, mission essential functions and their supporting assets are considered in determining restoration priorities. |
Maintenance Support |
Maintenance support for key IT assets can respond within required timeframes. |
Power Supply |
Uninterruptible Power Supply (UPS) and emergency generator protection is in place. |
Spares and Parts |
Maintenance spares and parts for key IT assets can be obtained within required timeframes. |
Backup Copies of Critical Software
|
Backup copies of critical software are maintained offsite. |
Trusted Recovery |
Procedures exist to ensure a secure and verifiable recovery effort. |
For partners purchasing IBM and Unisys mainframe processing, there is an Assured Computing Environment (ACE) approach to providing COOP that will meet Mission Assurance Category (MAC) II requirements (processor and data) for remote recovery. This approach is included in the standard rates for those services. For Linux on System z on the IBM mainframe, the rate includes the processing capacity and capability; however, an additional charge for the storage component of the recovery infrastructure is required.
Server-based processing does not include COOP/Service Continuity in the basic rates and requires the partner to specifically select and pay for the desired and compliant coverage (see Optional Features).
DISA, in concert with partner personnel, will take the lead in developing, maintaining and updating recovery procedures. However, the creation of recovery procedures will only be accomplished for those applications for which DISA has a defined and documented recovery obligation. Procedures designed to be used by an alternate provider or by the partner utilization of their own internal resources will have to be developed by those responsible for the recovery and familiar with the supporting infrastructure for the planned recovery. In accordance with established OpSec principles and mandated OpSec training, the procedures developed by DISA will not be distributed outside of DISA.
For formal external audits, DISA will allow auditors, working through the DISA Chief Information Officer (CIO) office, to review documentation associated with satisfying COOP-related IA controls. For security purposes, specific recovery documentation will not be distributed to the auditors.
When contracting with DISA for COOP/Service Continuity, the partner can request exercises, through their Customer Account Representative (CAR), of that coverage using the processes and/or environments that would be used for an actual recovery. The two primary types of exercises are tabletop and simulation, also known as a “remote recovery” exercise. There is no additional charge to the partner to conduct these exercises; however, they are limited to no more than one per year. The first exercise for either new workloads or workloads that have undergone major updates or changes to the operating environment (OE) will be conducted as a tabletop. After that the application will be scheduled for a simulation exercise with simulation exercises to be conducted every third year (unless the partner advises that a tabletop exercise is sufficient to meet their requirements) with tabletop exercises offered in the interim years. A draft exercise schedule will be provided to each partner contracted with DISA to provide COOP/Service Continuity by their CAR 60 days prior to the start of each fiscal year.
In a tabletop exercise, the personnel who would be involved in an actual recovery gather together and walk through the processes developed for that recovery. The time to complete a tabletop exercise will be dependent on the scope and coordination requirements associated with the exercise. Typically, these exercises are conducted within an hour. DISA will initiate exercise coordination with the partner no later than 45 days prior to the start of the exercise and will deliver an exercise plan no later than 7 to 14 days prior to the start of the exercise.
In a simulation exercise, the application or applications in question are physically recovered at their pre-designated recovery site using the recovery procedures in the Business Continuity Plan (BCP) for the production site. The time to complete such an exercise will be dependent on the scope and coordination requirements associated with the exercise as well as the environment at the designated recovery site(s). Typically, these exercises will be two to three weeks in length. DISA will initiate exercise coordination with the partner no later than 90 days prior to the start of the exercise and will deliver the exercise plan no later than 30 days prior to the start of the exercise.
DISA will provide the partner with the following deliverables at the conclusion of each tabletop and simulation exercise: 1) A Statement of Execution (SoE), no later than three business days after completion of an exercise, and 2) an after action report (AAR) with a target date of 30 calendar days after completion of an exercise.
OPTIONAL FEATURES
There are seven options available for server-based COOP/Service Continuity; five of these are standard options, one is an additional custom option, and one is to decline DISA-provided COOP coverage.
The table below shows the five standard options known as Remote Recovery Combinations (RRCs). In order to have a recovery option that meets the COOP requirements detailed in DoDI 8500.2, appropriate selections must be made for both storage (data) recovery and server (processor) recovery. These pre-defined combinations allow both elements to be addressed effectively.
Option |
MAC Level |
Description |
Storage Offering |
Processor Offering |
RTO/RPO |
RRC 1 |
MAC III |
Remote recovery using tape-based data backups and shared processing capability at the default designated recovery site. Read more... |
Basic Remote |
Shared COOP |
Recovery Time Objective (RTO) = 5 Days
Recovery Point Objective (RPO) = 7 Days |
RRC 1.2 |
MAC III |
Remote recovery using replication of data and shared processing capability at the default designated recovery site. Read more... |
Operational Remote |
Shared COOP |
RTO = 5 Days
RPO = 8 Hours |
RRC 2 |
MAC II |
Remote recovery using replication of data as well as a dedicated, pre-configured processing capability at the designated recovery site. Read more... |
Operational Remote |
Dedicated COOP |
RTO = 24 Hours & RPO = 8 Hours |
RRC 3 |
MAC II |
Remote recovery using replication of data as well as a dedicated, pre-configured, and operational processing capability at the designated recovery site. Read more... |
High-Availability (HA) Remote |
Dedicated COOP |
RTO & RPO = 8 Hours |
RRC 4 |
MAC I |
Remote recovery using near-synchronous replication of data as well as dedicated, pre-configured, and operational processing capability at the designated recovery site. Read more... |
Non-Disruptive Remote (Host-Based Replication Only) |
Dedicated COOP |
RTO = 30 Min
RPO = 1 Sec |
Custom COOP/Service Continuity (Option 6) is available to those where mission requirements for a particular application, or suite of applications, are not adequately addressed by any of the standard options identified above.
If the partner determines that the pre-defined approaches are not adequate or preferred, then a custom solution can be developed and implemented (Failover, Test and Development [T&D], partner-managed, etc.).
Any solution of this type must be identified within the relevant SLA. In addition, any supporting documentation must be linked to, or referenced within, that SLA. For partner-managed solutions where the partner is responsible for facilitating their own exercises and maintaining their own documents to support regulatory requirements, they are still limited to one exercise per year and are required to follow the Exercise Restrictions & Guidance policy with regard to scheduling and site coordination.
Within the SLA, there is also the ability to confirm that “No DISA-provided COOP” is requested.
Remote Recovery Options
Geographically remote recovery differs from an operational recovery in that it assumes the primary processing environment is no longer operational or no longer accessible. In that situation, the only alternative is to cease processing until the primary environment is available or to move the processing to an alternate location. The following entries/offerings will deal with DISA offerings associated with that remote recovery strategy utilizing an alternate location.
Remote Recovery - Combination 1
Recovery Time Objective (RTO) = 5 Days
Recovery Point Objective (RPO) = 7 Days
This level of continuity provides a secure processing environment with sufficient storage infrastructure in place to allow a remote recovery with an RTO of five days and an RPO of seven days. The RTO timeline is driven by the use of tape-based backups to restore all required backup data to storage capacity pre-positioned at the recovery site. The RPO is driven by the frequency of backups stored off-site from the primary processing facility.
For this level of recovery to be effective, a corresponding hardware and software infrastructure needs to be available and operational at the remote recovery site. This approach is designed to use shared resources at a single site to provide continuity for production requirements.
Because the shared resources are designed to be used by multiple sites running various applications for multiple partners, the resources are installed in a fairly "vanilla" configuration. Upon notification that an outage has occurred, DISA personnel will begin customizing and configuring the infrastructure to accommodate the incoming processing.
Upon the restoration of the primary production facility, the processing will be removed from the remote recovery site and returned to the primary site. At that point, the shared resources will be returned to their default configuration.
Remote Recovery - Combination 1.2
RTO = 5 Days
RPO = 8 Hours
This level of continuity provides a secure processing environment with sufficient storage infrastructure in place to allow a remote recovery with an RTO of 5 days and an RPO of 8 hours. The RTO timeline is driven by the use of "Shared COOP" resources that will be configured, at the time of an outage or exercise, to match the associated production environment. The RPO timeline is driven by the use of data replication between the production environment and the recovery location. For this option to be effective, it requires the partner to select not only the appropriate storage option, but also the Shared COOP processor option. By having those elements in place, the required RTO and RPO targets are achievable.
Remote Recovery - Combination 2
RTO = 24 Hours
RPO = 8 Hours
This level of continuity provides a secure processing environment with sufficient storage infrastructure in place to allow a remote recovery with an RTO of 24 hours and an RPO of 8 hours. The timeline is driven by the use of data backups stored at the remote recovery site in combination with dedicated and pre-configured server resources available there. For this option to be effective, it requires the partner to select not only the appropriate storage option, but also the appropriate remote dedicated processor offering. By having dedicated and pre-configured equipment in place, the required RTO and RPO targets are achievable.
Remote Recovery - Combination 3
RTO = 8 Hours
RPO = 8 Hours
This level of continuity provides a secure processing environment with sufficient storage infrastructure in place to allow a remote recovery with an RTO of 8 hours and an RPO of 8 hours. The timeline is driven by the use of data backups stored at the remote recovery site in combination with dedicated and pre-configured server resources available there. For this option to be effective, it requires the partner to select not only the appropriate storage option, but also the appropriate remote dedicated processor offering. By having dedicated and pre-configured equipment in place, the required RTO and RPO targets are achievable.
Remote Recovery - Combination 4
RTO = 30 Minutes
RPO = 1 Second
This level of continuity provides a secure processing environment with sufficient storage infrastructure in place to allow a remote recovery with an RTO of 30 minutes and an RPO of less than one second. The timeline is driven by the use of data replication to create near-instantaneous backups stored in an online status at the remote recovery site. This approach, in combination with dedicated, pre-configured and operational server resources, can provide assurance of minimal processing interruption with virtually no data loss.
For this option to be effective, it requires the partner to select the appropriate storage option and an infrastructure to be resident at the recovery site that can be brought online in less than 30 minutes. Any hardware solution for recovery requirements this stringent will be developed as a customized solution.
Customized Fail-Over
It is possible that mission requirements for a particular application, or suite of applications, are not adequately addressed by any of the standard Remote Recovery Combinations defined above. For example, it may be that a workload balanced production environment is in place and the desired Continuity of Operations (COOP) solution is to have the environment sized and configured to absorb the loss of one or more elements of the environment. Assuming that the sites are geographically separate, that would be a feasible solution. If the partner does determine that a fail-over solution is desired and that the pre-defined approaches are not adequate or preferred, then a customized fail-over solution can be developed and implemented. Any solution of this type must be identified within the relevant SLA and supporting documentation must be appended to or referenced within that SLA. This approach would be identified in the SLA as a “Custom” solution.
Test and Development (T&D) Solutions
This approach is used in some instances where DISA provides and supports both a production environment and an associated T&D environment for a specific application. For this approach to be a valid solution the two environments MUST be in geographically separate locations and the T&D environment must be appropriately sized to serve as a COOP solution for the production site. Any solution of this type must be identified within the relevant SLA and supporting documentation must be appended to or referenced within that SLA. This approach would be identified in the SLA as a “Custom” solution.