Geographic redundancy is not just for big enterprises. Small and medium-sized businesses (SMBs) can take advantage of it to protect their critical apps and keep downtime to a minimum. How, you ask? Well, if you’re running the apps on VMware virtualization, then VM replication technology and expert managed hosting are a good place to start.
In this three-part blog series, I’ll cover the following common challenges that IT managers face when considering a resiliency solution.
Top 3 Challenges:
You’re Free to Test, But Testing Isn’t Free
Remember my first blog installment? I defined failover as the process of switching to the backup infrastructure in the secondary DC after a major disruption causes the apps in the primary data center to become unavailable. Testing failover and the subsequent failback can be challenging, especially for SMBs. It requires time, resources and a ton of planning.
It also involves risk. With a full failover/failback test, you’re putting your production workloads on the line. What happens if the failover, well, fails? Or if the failback doesn’t bring up your primary production environment as expected? This uncertainty is precisely why extensive planning must happen.
There’s a substantial cost related to every time you perform a full failover test including: the time it takes to plan, the personnel resources who are on hand to manage the failover and failback and any charges from the service provider for performing the full test.
Testing…1, 2, 3?
How often should you fully test the failover? Unfortunately, the answer is, it depends. Testing is needed at an interval that makes sense for your company and your budget. Your production environment is in flux – data is changing and growing, new code is being pushed for apps, operating systems are being patched, hypervisors and VMs are being added, bandwidth requirements are increasing, etc. As part of any sound DR strategy, it is recommended that you execute the failover runbook as part of a real-world test of the failover process.
While there is no substitute for a real-world test between data centers, there is a way to supplement this occasional drill with more frequent snapshot-based tests. You can quickly and affordably simulate how your replicated production VMs would respond if they were restarted in a different DC.
Some replication software or managed services offer the ability to create a snapshot of the critical VMs being replicated provided that you have enough extra storage space in the redundant infrastructure. This test only occurs in the secondary DC and doesn’t involve your production environment; thereby removing the risk and extensive planning required for a full failover test.
Take a look at the graphic below. It represents a snapshot-based failover test of the replicated VM 2. You’ll notice that the replication process continues uninterrupted, and the replicated VM 2 remains powered off. In Data Center 2, a snapshot of the offline VM is created, then powered on, tested and finally deleted. This test is quick, easy and doesn’t require planning or an IT team on hand. This test is contained in a sandbox environment and doesn’t affect your production VMs or even the replication process.
Although SMBs should still perform a full DC-to-DC failover and failback test, snapshot-based test can be done quickly and often. When major changes are replicated to the VMs in the secondary DC, an SMB can do a quick check to see if their critical apps will start up and run properly. More importantly, it can be done with no cost, minimal team distraction and zero risk to the production environment.
Want to learn more about VM replication and resiliency, and how to overcome these hurdles? Check out this presentation on SlideShare: VM Replication Is Your Lifeline When Disaster Strikes.