Friday, January 24, 2014

Fault Tolerant and Fail Over is There a Difference?

Summary: Fault tolerant (FT) solutions go beyond HA fail over solutions to present an environment that is never seen to fail not merely an environment that survives a failure.  Some suppliers of FT technology call this "fail through" rather than fail over.
Fault tolerant (FT) solutions go beyond HA fail over solutions to present an environment that is never seen to fail not merely an environment that survives a failure.  Some suppliers of FT technology call this "fail through" rather than fail over. I thought that was a well known concept and was surprised to find that the distinction is still not clear to some.
While speaking with a potential client about how different forms of virtualization could address his organization's requirements, I detected that some of my comments created confusion rather than clarifying things.  As an aside, it appears that I have an innate ability to make some technology appear more complex that it really needs to be.
I'd like to offer a summary of the discussion while it is still fresh in my mind.
Virtualization technology, taken broadly, offers a number of approaches to availability. Here are a few of them.

  • Access to application solutions can be virtualized.  If the back end system fails, the individual using the application is connected to another system that offers the same application.  More sophisticated access virtualization software may make this process automatic. Even more sophisticated products in this area will remember the state of the application and give the impression that nothing ever failed. Doing this last bit, however, usually involves other forms of virtualization. This process, by the way, is unlikely to be instantaneous.
  • Application frameworks may offer load balancing and failover capabilities. The application framework monitor, upon detecting either a failure to meet service level objectives or some other type of failure, would start the application on another machine. Once again, the process could be automatic or require manual intervention. If other types of virtualization are in use, the actual state of the application could be saved during the process. While this process may happen quickly, it is likely that individuals using the application would notice a pause or a slow-down.
  • Processing virtualization, which includes clustering, parallel processing and virtual machine software, may offer similar load balancing and fail over capabilities to that offered by application framework virtualization for selected or all applications on a given system. The key difference between the levels of virtualization is that application framework virtualization only virtualizes applications running in that framework. Processing virtualization makes it possible for applications, data management products or even basic system services to fail over to another system. As with the other forms of virtualization, the fail over process can take some time.
  • Virtualizing storage often a necessity for all of the other forms of virtualization. After all, what good is moving an application over to another system, if the data it was processing is no longer available. Storage virtualization could be implemented using special purpose software on general purpose systems or by moving the entire storage function to a special purpose storage server.
All of these are well and good. What happens, however, when the requirement is that failures arenever seen? This is the realm of FT systems.  In this case special purpose, redundant hardware configurations are deployed that are run in lock-step.  If one component of the system fails, the other continue working and the application does not fail.
Historically, FT solutions were quite expensive.  After all, every component of the system had to be replicated enough times to handle all expected failure scenarios. More recent solutions,  offered by suppliers such as Stratus and Maraton, are based upon industry standard systems and components. The use of off-the-shelf hardware significantly reduces the price of these solutions.
Does your organization deploy truly fault tolerant solutions or do one of the other forms of virtualization offer sufficient levels of reliability and availability? 

No comments: