CHAPTER 14, PART 1

DISASTER RECOVERY AND BUSINESS RESUMPTION PLANS

 

 

1          BACKGROUND

           

The mission of the Office of the Chief Information Officer (OCIO) is to strategically acquire and use information and technology resources to improve the quality, timeliness, and cost effectiveness of USDA service delivery to its customers.   The rapid pace of technological change and the way business is conducted has necessitated that USDA’s major systems, which support the day-to-day core business processes, are able to function in emergencies or disasters.   Most IT systems are vulnerable to many types of disruptions such as power outages, water damage, fire and viruses. These vulnerabilities are managed through risk assessments and appropriate security controls.   Risk results from a variety of factors but are typically labeled as:

 

·        Natural  - hurricane, tornado, flood, fire

·        Human  - sabotage, virus, operator error

·        Environmental  - equipment failure, outage, electric power failure

 

Implementation of IT Contingency Plans is critical in ensuring that USDA business will continue at an acceptable level in the face of a major incident or disaster.   An organization would use the suite of plans in figures 1 & 2 to properly prepare response, recovery, and continuity activities for disruptions affecting the organization’s IT systems, business processes, and the facilities.  This type of planning is part of a larger process to ensure information survivability of data.

Survivability implies that the data and process can be recovered regardless of the disaster or emergency.

 

 

 
 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 


Figure 1, IT Contingency Plans

 

Plan Definitions

 

Continuity of Operations Plan - COOP focuses on restoring an organization’s (usually a headquarters element) essential functions at an alternate site and performing those functions for up to 30 days before returning to normal operations.  Because a COOP addresses headquarters-level issues, it is developed and executed independently from the BCP.  Presidential Decision Directive (PDD) 67 mandates implementation of a viable COOP capability.  Minor disruptions that do not require relocation to an alternate site are typically not addressed; however, the COOP may include the BCP, DRP and BRP as appendices.

The Continuity of Operations (COOP) Planning Staff (CPS), under the Assistant Secretary for Administration, Office of Procurement and Property Management, serves as USDA's focal point for continuity of operations (COOP) and continuity of government (COG) program.

 

Business Continuity Plan (BCP):  The BCP focuses on sustaining an organization’s business functions during and after a disruption.  An example of a business function may be a payroll or consumer information process.  A BCP may be written for a specific business process or may address all key business processes.  Information technology (IT) systems are considered in the BCP in terms of support to the business processes.  In some cases, the BCP may not address long-term recovery of processes and return to normal operations, solely covering interim business continuity requirements.  A disaster recovery plan, business resumption plan, and occupant emergency plan may be appended to the BCP.  Responsibilities and priorities set in the BCP should be coordinated with those in Continuity of Operations to eliminate possible conflicts.

 

Business Resumption Plan – The BRP addresses the restoration of business processes after an emergency, but unlike the BCP, lacks procedures to ensure continuity of critical processes throughout an emergency or disruption.  Development of the BRP should be coordinated with DRP and BCP.  This plan may be appended to the BCP.

 

Disaster Recovery Plan (DRP)  - This plan applies to major, usually catastrophic, events that deny access to the normal facility for an extended period.  Frequently, DRP refers to an IT-focused plan designed to restore operability of the target system, application, or computer facility an alternate site after an emergency.  The DRP scope may overlap that of an IT contingency plan; however, the DRP is narrower in scope and does not address minor disruptions that do not require relocation.  Dependent on the agency’s needs, several DRPs may be appended to the BCP.

 

IT Contingency Plan (Continuity of Support Plan)– A set of advance arrangements and established procedures that provide guidance to enable an organization to recover mission critical IT services at a “local” or alternative site” following a “minor” or “major” disruptive event.  Plan duration is for short or long term effects.  OMB Circular A-130 requires the development and maintenance of continuity of support plans for general support systems and contingency plans for major applications.  This planning guide considers continuity of support planning to be synonymous with IT contingency planning.  Because an IT contingency plan should be developed for each major application and general support system, multiple contingency plans may be maintained with the agency or mission area BCP.

 

Cyber Incident Response Plan – This plan establishes procedures to address cyber attacks against an agency IT system(s).  These procedures are designed to enable security personnel to identify, mitigate, and recover from malicious computer incidents, such as unauthorized access to a system or data, denial of service, or unauthorized changes to system hardware, software, or data (e.g., a virus, worm, or Trojan horse).

 

Crisis Communications Plan - Organizations should prepare their internal and external communications procedures prior to a disaster. A crisis communications plan is often developed by the organization responsible for public outreach. The crisis communication plan procedures

should be coordinated with all other plans to ensure that only approved statements are released to the public. Plan procedures should be included as an appendix to the BCP. The communications plan typically designates specific individuals as the only authority for answering questions from the public regarding disaster response. It may also include procedures for disseminating status reports to personnel and to the public. Templates for press releases are included in the plan.

 

Occupant Emergency Plan – This plan provides the response procedures for occupants of a facility in the event of a situation posing a potential threat to the health and safety of personnel, the environment, or property.  Such events would include a fire, hurricane, criminal attack, or a medical emergency.  OEPs are developed at the facility level, specific to the geographic location and structural design of the building.  General Services Administration (GSA) owned facilities maintain plans based on the GSA OEP template.  The facility OEP may be appended to the BCP, but is executed separately.

 

FIGURE 2, PLAN RELATIONSHIPS

 
 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 


IT Contingency Planning involves all preventative processes necessary to c

 

Continue program delivery, including those that are not necessarily IT related.  The Computer Security Act of 1987, OMB Circular A-130, Appendix III, and PDD 63 require contingency planning for major systems as part of the security management process.   Specifically, these mandates require that contingency planning be conducted for each major system.  NIST Publication 800-34, Contingency Planning Guide for Information Technology Systems, provides additional guidance that will be used to establish USDA’s IT Contingency Program.  In order for plans in this program to be effective they must be executable, sustainable and tested on a regular basis.   In the event of a disruption, the Business Impact Analysis (BIA) for major systems will determine how rapidly a system must be recovered.  The BIA, a critical part of contingency planning, is conduct by the business owner and is used to establish contingency requirements and priorities in the event of a significant disruption in service. 

 

Another critical component of this planning involves the development and implementation of the Disaster Recovery Plan (DRP) and Business Resumption Plan (BRP).   These plans are designed to ensure that agencies and staff offices have the ability to maintain an acceptable level of business activities during and after a disaster.  They also provide for a smooth and rapid restoration of major IT systems.   DRP and BRP ensure that each agency establishes accountability for implementing, testing, and ongoing maintenance of these plans.   In addition, they support the recovery of these systems in accordance with predetermined resumption strategies and disaster recovery measures.  USDA IT and Business Program managers must collaborate and communicate on how to continue business and recover if service is disrupted. 

 

The DRP refers to an IT-focused plan designed to restore operability of the target system, applications or computer facility at an alternate site after an emergency.  It is narrower in scope than a COOP and does not address minor disruptions that do not require relocation.   The BRP contains instructions or procedures describing how the business will be restored after a significant disruption has occurred and must be coordinated with other plans such as DRP, Occupant Emergency Plan (OEP), Contingency of Operations Plan (COOP), and Business Continuity Plan (BCP) which provide for the resumption of critical processes in providing acceptable level of service to customers.  Therefore, integration of activities will ensure cohesiveness and that an effective IT Contingency Planning Program exists within the agency.

 

 

2          POLICY

           

            Each agency and staff office will establish an IT Contingency Planning process.  An executable DRP and BRP will be developed for each major system to ensure core business functions can be restored to full operation with minimum downtime in the event of a disruption or disaster.  Contingency Planning will be incorporated and integrated in the system development life cycle process for all IT systems.

 

Each agency will use the departmental enterprise-wide software, Living Disaster Recovery Planning System (LDRPS), or approved comparable software to develop all USDA DRP and BRPs.  Templates for these plans can be found in the LDRPS software.  These plans will be implemented, tested and maintained for all major systems in support of critical business functions.  All Plans must be detailed, routinely reviewed, and updated to provide for reasonable continuity of IT support in the event of a disaster.  It is recommended that the agency require certification of the Contingency Planning Coordinator.  Each agency and staff office shall take the following contingency planning actions:

 

a         Conduct a Business Impact Analysis (BIA) to identify and prioritize critical IT resources.  This analysis also determines the acceptable minimum level of system support necessary to restore mission critical core business functions and ranks business functions for restoration purposes. 

 

b         Identify preventive controls, which are measures to reduce the effects of an IT system disruptions.  These measures can increase system availability and reduce contingency life costs; 

 

c          Develop recovery strategies to ensure that the system may be recovered quickly and effectively following an incident;

           

d         Develop disaster recovery and business resumption plans that must include guidance and procedures for restoring the system that supports core business functions; the recovery procedures should be detailed enough that other personnel with the same job functions could perform the recovery tasks.

 

e          Maintain and update DRP and BRPs.  Agencies and Staff Offices must update plans biannually for major systems or following any significant change to their computing or telecommunications environment. 

 

f           Schedule testing for these plans.  Develop a testing program and schedule for tests with review by CS, as required.   Table Top testing should precede Live Testing to ensure the written plan is executable.  Any deficiencies revealed by the tests must be corrected.  The type of test and extent of testing will depend upon:

 

·        criticality of agency business functions

·        cost of executing the test plan

·        budget availability

·        complexity of information system and components

 

g         Train employees.  Assure sufficient employees are trained to provide alternates for key recovery positions. 

 

h          Participate in audit reviews.  CS, GAO, OIG will conduct informal and formal review of all plans to ensure that they

are executable and in compliance with standards.

 

Policy Exception Requirements – Agencies will submit all policy exception requests directly to the ACIO for Cyber Security.  Exceptions to policy will be considered only in terms of implementation time; exceptions will not be granted to the requirement to conform to this policy.  Exceptions that are approved will be interim in nature and will require that each agency report this policy exception as a Plan of Action & Milestone (POA&M) in their FISMA reporting until full compliance is achieved.  Interim exceptions cannot extend beyond the fiscal year.  Compliance exceptions that require longer durations will be renewed on an annual basis with a updated timeline for completion.  CS will monitor all approved exceptions.

 

 

3                    RESPONSIBILITIES

 

a         The Associate CIO for Cyber Security  will:

 

(1)              Provide guidance and strategies to agencies and staff offices to assist them in establishing an Information Survivability Program; this includes contingency planning actions, developing, testing, and implementing executable DRP and BRPs;

 

(2)              Review agency DRP and BRPs; track and monitor agency compliance with this policy.  Provide an assessment report of all IT Contingency Plans to each Agency Administrator;

 

(3)              Work closely with OIG to review all plans and provide the OIG assessment findings to each Agency Administrator or Agency Head.  In addition, provide a specific timeline to complete any necessary revisions to ensure an executable plan; 

 

(4)              Identify measures that may enable enterprise-wide advantages in DRP and BRP activities across the Department;

 

(5)              Direct, coordinate, and perform oversight reviews in compliance with this policy, as required;

 

(6)              Observe DRP and BRP testing, as required;

 

(7)              Evaluate and recommend a specific course of action to remedy deficiencies found during review of plans or tests; and

 

(8)              Take necessary actions to impose penalties, if necessary, to ensure compliance with policy.

     

b         Agency Heads/Administrators will:

 

(1)              Designate a senior management official to establish and manage an Information Survivability Program within their agency or staff office;

 

(2)              Provide annual budgeted funding and staffing for disaster recovery and business resumption activities such as testing, training and off-site storage; report all related security costs as required by OMB for system DRP, BRP and contingency planning for IT systems; and

 

(3)              Ensure that all major systems are identified and prioritized in order of criticality and that all plans are reviewed, approved, and certified with a signature.

 

c          Agency Chief Information Officer will:

 

(1)              Establish and manage the IT Contingency Planning Program within the organization.  Ensure that the positions and staff years are established to develop, implement, and maintain DRP and BRPs for each major system.  Designate and train a Contingency Planning Coordinator;

 

(2)              Advise and recommend to senior management within the organization solutions regarding DRP and BRPs based on CS reviews;

 

(3)              Ensure that DRP and BRPs are: developed using the departmental enterprise software or an approved equivalent for major systems identified; ranked according to priority with the maximum system outage appropriate to the delivery of products and services; reviewed bi-annually, and executable in the event of a major incident or disaster.   Ensure that there is an alternate backup site with operating procedures and personnel designated to run specific applications at the site; 

 

(4)              Ensure that DRP and BRP recovery solutions are closely coordinated and integrated with all emergency preparedness plans for major systems, interconnected systems and business processes as part of the system development life cycle;

 

(5)              Test DRP and BRPs at least bi-annually or when a significant change occurs to the system unless an approved waiver has been obtained;

 

(6)              Ensure that recovery procedures are developed and implemented;

 

(7)              Provide specialized training and certification opportunities to the Contingency Planning Coordinator, appropriate training to all disaster recovery and business resumption team personnel and general disaster awareness training for all employees; and

 

(8)              Ensure that all DRP and BRPs are reviewed and approved by the Agency Head; an electronic copy of all plans will be saved in the Enterprise recommended or other approved software; CS reserves the right to review all plans.

 

            d         The Contingency Planning Coordinator will:

 

(1)              Identify and coordinate with internal and external points of contact for each major system to characterize the ways that they depend on or support the IT system.  Ensure that data back up is implemented daily of critical files or tapes and stored off-site in the event of an incident or disaster;

 

(2)              Identify disruption impacts and allowable outage times.  Identify the maximum allowable time that a resource may be denied before it prevents or inhibits the performance of an essential function;

 

(3)              Develop and prioritize recovery strategies that personnel will implement during contingency plan activation.  Consider issues such as cost, allowable outage time, security, and integration with larger organization-level plans;

 

(4)              Coordinate with officials to establish contingency teams and team leaders for damage assessment and recovery teams; and

 

(5)              Ensure that plans are review and updated biannually.

           

-END-