Stages of Recovery After a Cyber Incident

Charlie Maclean-Bristol looks at the 9 stages of recovery from a cyber incident and highlights the importance of having recovery in our business continuity plans.

by Charlie Maclean-Bristol

This is the third part of my journey to discover more about backups and the technical aspects of recovery after a cyber incident. I realize most readers of this bulletin are never going to be involved in the technical recovery after a cyber incident, but I think it is very important we understand the stages of recovery, and how long it will take to restore the full functionality of all our systems and applications.

For those responsible for business continuity within our organizations, we have to make sure that our plans take into account the likely time and priority order of the recovery. We might put the RTO of an activity at 24 hours, but after looking at the recovery priority, we find out that the applications which support our activity are lower down the priority list and not likely to be recovered until day 7. Once we understand this, we can go back to our strategy for the activity, and work out how to continue the activity at a minimum level for 7 days. So, the more we understand about the recovery after a cyber incident, the better our planning will be, and we can challenge our recovery assumptions and see if our RTOs for applications and activities are realistic.

The below are the recovery stages, and I have provided some information practitioners need to think about.

1. Detecting an Attack

An attack can come in a number of ways. We need to check how the escalation from whoever has detected the incident integrates with our incident management structure.

Monitoring alerts
Automated logs
Third-party monitoring services (SOCs)
Users within the organization
Informed by the attacker
Third parties, including industry bodies, your vendor partners, or the government
Information on new vulnerabilities and exploits
Anomalies detected by audits, investigations, or reviews
Publicly available information – information on new vulnerabilities and exploits
People outside the organization – NCSC, Police

Looking at this list, the obvious way an incident will be detected is within our IT department, and then escalated to senior managers if it is deemed serious enough. We also have to make sure that we are prepared for alerts to come from outside the organization and ensure that these are escalated both to senior managers and IT.

If there is a zero-day vulnerability found that affects our systems, we need to consider how we check whether it has been exploited and then patch it. We need to manage the period before this is sorted, as there is the potential that we have already been breached. Figure 1 shows that the most likely way victims find out about an attack is from the attacker.

Detection to analysis can take minutes to days.

*Figure 1 – How breaches were detected (measured in millions)*

2. Incident Analysis & Use of Cyber Threat Intelligence

An initial analysis is conducted to determine the incident’s scope, such as which networks, systems, or applications are affected; who or what originated the incident; and how the incident is occurring (e.g., what tools or attack methods are being used, what vulnerabilities are being exploited).

The initial analysis should provide enough information for the team to prioritize subsequent activities, such as containment of the incident and deeper analysis of its effects. Threat intelligence will be used to examine the malware, which hopefully will tell us more about the attacker.

Some attacks may be obvious, like a defaced web page, while others may be much harder to understand, as the attackers may be extremely good at covering their tracks. You may be able to see the symptoms of the attack, but it may take a lot of investigation to understand how it took place.

3. Incident Prioritization (Triage, Classifying, Prioritising & Assigning)

Prioritizing the handling of the incident is a critical decision point in the incident handling process. This is where you have to decide where to start and which applications or systems should be dealt with first.

The prioritization can be based on a number of factors:

The functional impact of the incident – which activities have been impacted
The information impact of the incident – understanding the impact on the organization’s data
Recoverability from the incident – how easy it will be to recover different systems – you may prioritize the easiest system to recover first.

We can contribute to this stage by agreeing in advance with IT on the priority of recovery for applications, using the information taken from the BIA.

4. Incident Notifications

This is where IT starts to inform people of the incident. We always have to note that the normal communication means may be unavailable.

5. Choosing a Containment Strategy

An essential part of containment is the decision-making process on how it is conducted. This could include shutting down a system, disconnecting it from a network, disabling certain functions, or disconnecting the entire organization from external networks.

Guidance on this stage suggests that there should be strategies in place for how the organization would respond to different types of attacks. The purpose of containment is to try and prevent the attacker from gaining access to other systems and causing further damage.

This may need to be conducted quickly, as the attacker may try and execute an attack if they know they have been detected. However, containment must be done carefully, as shutting down an application could trigger an automated attack already set up, leading to further damage.

Containment can take hours to days.

6. Evidence Gathering & Handling

Initially, I thought this stage was all about gathering evidence so that the hackers could be prosecuted in the future. While this may be part of it, the primary goal is to understand the attack, what damage has been done, the extent of the attack, and how the hackers got into the systems.

The NIST Computer Security Incident Handling Guide suggests that organizations should avoid wasting time trying to identify the attacking host unless there is a good reason to do so, as it could detract from more urgent recovery efforts.

7. Eradication

After an incident has been contained, eradication may be necessary to eliminate components of the incident, such as deleting malware and disabling breached user accounts, as well as identifying and mitigating all vulnerabilities that were exploited. This must be done quickly and precisely, as attackers may attempt to re-establish access.

Eradication can take days to weeks.

8. Recovery

Only once you are assured that the attackers have been eradicated and no longer have access or the ability to execute attacks within your network can you start to recover systems. This needs to be done in a phased approach, based on a predetermined strategy and order.

Recovery can take weeks to months.

9. Lessons Learned

Lastly, we need to conduct a lessons learned session. The NIST guidance highlights the importance of learning from incidents and documenting those lessons to improve future responses.

Conclusion

MTTI (Mean Time to Identify) is the average time it takes to detect that a security incident has occurred, and MTTC (Mean Time to Contain) is the average time it takes to contain an identified incident and prevent further damage.

Looking at Figure 2 [1], we can see that attacks often persist in systems for months before they are detected or before attackers disclose their presence. This gives them plenty of time to plan and execute their attack as well as exfiltrate data.

*Figure 2 – Time taken to detect and contain the incident (measured in days)*

MTTC shows that, on average, it takes several weeks before the attack is contained and systems can be safely recovered without the worry of reinfection.

For us, the key takeaway is that we need to understand the process so that after a cyber attack, we can track recovery progress and have a realistic understanding of how long it will take our organization to recover, enabling us to build our plans around this.

[1] IBM Cost of a Data Breach Report 2024

++++++++++++++++++++++++++++++++++++++++++++++++

This article was originally published by BC Training Ltd.

Charlie Maclean-Bristol is the author of the groundbreaking book, Business Continuity Exercises: Quick Exercises to Validate Your Plan

“Charlie drives home the importance of continuing to identify lessons from real-life incidents and crises, but more importantly, how to learn the lessons and bring them into our plans. Running an exercise, no matter how simple, is always an opportunity to learn.” – Deborah Higgins, Head of Cabinet Office, Emergency Planning College, United Kingdom