In the previous post in this series titled Be Ready for ‘Anything’, we introduced the concept of disaster recovery. While organizations can do their best to design and implement redundant, highly available systems, disaster recovery plans still need to be made to handle the recovery process from large downtimes and/or disasters when they happen. Disaster recovery planning can be a daunting task and potentially even difficult to know where to start. We need tools and methods to help us determine what types of solutions to implement and how much we may need to invest in disaster recovery. As I have brought up before, any type of technology planning and implementation should first start with knowing and understanding the business requirements, and DR is no different. A couple of concepts to help us plan for and implement an efficient DR process are recovery time objective (RTO) and recovery point objective (RPO). I will be honest, these are two concepts that I had come across before and had no idea what they meant, so I was very glad to get to dig into the meanings as part of this Cloud Essentials+ journey. In the rest of this post I will cover what I have learned about the definitions of RTO and RPO.
Recovery Time Objective (RTO)
In my opinion, recovery time objective (RTO) is a bit more of a straightforward concept of the the two recovery objectives. Just as it is there in the name, time is the key concept with RTO. The goal of recovery time objective is to understand the amount of time a system or application can be unavailable before it is highly detrimental to the business and customers. Another way to look at it is that RTO is the amount of time in which applications/services must be back online after an outage or disaster.
Recovery Point Objective (RPO)
While RTO is focused strictly around understanding the amount of time in which recovery needs to happen, recovery point objective (RPO) is centered around data loss toleration. There is still a time concept with RPO, but it is directly related to data loss. RPO helps us understand how far out of date our data can be when it is restored so that an application or service is still relevant and usable. Understanding how far out of date data can be upon restoration ties directly to helping us know how often we need to take backups of our data. A more direct description of RPO is that it is the amount of time between the last known, good backup and a the outage or disaster. Let’s get into a quick example. An organization has an application that they support. The application’s database server is currently backed up once per day at midnight. That database server fails at 10:00 AM and must be restored from backup. Once that server is restored, there would be 10+ hours of data that is lost. The organization can use the recovery point objective concept to determine if that is good enough or if backups need to be taken more frequently.
Rounding Out Our Objectives
Understanding and leveraging the concepts of recovery time objective (RTO) and recovery point objective (RPO) can help us ensure that we are designing and implementing effective and efficient disaster recovery plans. In order to understand the requirements that go into determining RTO and RPO values, I feel that it is important for IT leaders to communicate with business leaders. IT departments need to properly understand the needs of the business to make sure they are delivering technology solutions, including disaster recovery, that meet the needs of the business.