Which Practice May Involve Disaster Recovery Initiation?

The practice that involves the initiation of disaster recovery is service continuity management. This is the IT service management practice, defined within the ITIL 4 framework, responsible for ensuring that organizations can resume critical services after a major disruption. If you encountered this question on a certification exam or coursework, service continuity management is the answer you’re looking for.

But understanding why this is the answer, and what “initiation of disaster recovery” actually means in practice, is worth a closer look.

What Service Continuity Management Does

Service continuity management is the practice that plans for, coordinates, and triggers disaster recovery when a serious disruption hits an organization’s IT services. It sits within a broader business continuity plan, which covers the entire organization. Think of the business continuity plan as the master document for managing disaster preparedness across the whole company, while the disaster recovery plan is the technology pillar focused specifically on restoring IT systems and infrastructure.

Service continuity management owns the disaster recovery lifecycle: identifying which services are critical, defining how quickly they need to be restored, testing recovery procedures, and making the call to activate those procedures when something goes wrong. That activation step is what “initiation” refers to.

When Disaster Recovery Gets Activated

Not every outage triggers disaster recovery. Organizations draw a clear line between routine incident management and full disaster recovery. Incident management is a tactical process designed to identify, contain, and neutralize threats like malware or a server failure before they escalate. Disaster recovery is a strategic process that kicks in after the damage is done, with the goal of restoring business operations.

The decision to cross that line depends on a damage assessment against predefined criteria. According to guidelines from the State of Maryland’s IT disaster recovery framework, activation typically occurs when one or more of these conditions are met:

  • Personnel safety or facility damage: The physical environment is compromised enough to prevent normal operations.
  • System damage: One or more critical systems are significantly impaired or destroyed.
  • Mission impact: The organization can no longer fulfill its core functions.
  • Duration of disruption: The anticipated downtime exceeds the organization’s recovery time objective (RTO).

That last criterion is especially important. Every critical system should have a predefined RTO, which is the maximum amount of time the organization can tolerate that system being down before consequences become unacceptable. If a damage assessment suggests the outage will last longer than the RTO, disaster recovery is initiated.

RTO and RPO: The Two Key Thresholds

Two metrics drive nearly every disaster recovery decision. Recovery time objective (RTO) answers the question: how long can we be down before the impact is too severe? Recovery point objective (RPO) answers a different question: how much data can we afford to lose?

RPO is measured as a window of time, not a volume of files. If your last clean backup is from 18 hours ago and your RPO is 20 hours, you’re still within acceptable limits. If your RPO is 4 hours and your last backup is 18 hours old, you’ve already lost more data than your plan allows. These thresholds are set by business owners during the planning phase, and they directly shape what kind of backup and recovery infrastructure gets built. Organizations with aggressive RPOs measured in minutes need near-continuous data replication, while those tolerating hours of data loss can rely on periodic backups.

Who Makes the Call

Initiating disaster recovery is a deliberate decision made by people with the authority to do so. In IT organizations, this responsibility typically falls to a designated disaster recovery coordinator, a CIO, or a senior operations leader. The decision follows a structured assessment: responders evaluate the scope of damage, estimate how long restoration will take, and compare that estimate against the organization’s defined thresholds.

In broader community disasters, FEMA’s National Disaster Recovery Framework recommends that state governors and local government leaders appoint Local Disaster Recovery Managers to organize and coordinate recovery activities. The principle is the same at every scale: someone with defined authority reviews the situation against predefined criteria and formally declares that disaster recovery has begun.

Automated Failover in Cloud Environments

Modern cloud infrastructure can initiate certain disaster recovery actions automatically, without waiting for a human decision. In an active/passive cloud setup, all traffic normally goes to a primary data center. If that primary location becomes unavailable, traffic can automatically switch to a backup region based on health checks and monitoring alarms.

Cloud providers offer tools that continuously monitor application endpoints and route traffic only to healthy ones. This DNS-based failover is a reliable, automated operation. However, automatically initiated failover based on health checks should be used with caution, since false positives from a brief monitoring glitch could trigger an unnecessary and disruptive switchover. Many organizations use automated detection to flag problems but require a human to authorize the full failover.

Cybersecurity Events as a Trigger

Ransomware attacks are one of the most common reasons organizations initiate disaster recovery today. NIST recommends that organizations prepare by developing an incident recovery plan with clearly defined roles and decision-making strategies, then regularly testing that plan through exercises. Secure, isolated backups are critical because ransomware can spread to connected backup systems if they aren’t properly segmented.

NIST also emphasizes maintaining an up-to-date contact list that includes law enforcement and internal stakeholders, with each contact’s role in recovery clearly defined. The recovery process after a ransomware attack involves restoring operating systems, databases, user files, applications, and software configurations from clean backups, a process detailed in NIST Special Publication 1800-11.

What Happens After Initiation

Once disaster recovery is formally initiated, the process moves through a predictable sequence. First, key personnel are notified and the recovery team assembles. Then the predefined recovery procedures are executed: restoring systems from backups, activating standby infrastructure, or failing over to a secondary site. Throughout this phase, the team tracks progress against the RTO and RPO targets set during planning.

After critical systems are restored and operations resume, the focus shifts to full normalization. This means returning from temporary or backup infrastructure to the organization’s primary environment, verifying data integrity, and conducting a post-incident review to identify what worked and what needs to change. The entire lifecycle, from initiation through return to normal operations, is owned by the service continuity management practice.