Cloudy Perspectives: Whose Fault is it?

One of the intriguing debates I continually hear surrounds the following question: "When will the Cloud be ready for prime-time?". In truth, and despite all the hype surrounding Cloud computing, the Cloud, for the most part, has been predominantly confined to test environments, with few tangible examples of live, mission-critical deployments. Of course, this is changing rapidly and I believe we are quickly migrating along a roadmap in which we will see the adoption of virtualization and the application of virtual infrastructure to mission-critical "private cloud" deployments.

However, there are very real roadblocks to this vision becoming a reality. Chief among them is making sure one's systems are running reliably and consistently -- always. The requirement of "5-nines" uptime (uptime of 99.999%) must apply not only to physical environments, but also to virtual ones. In the physical world we have dealt with system uptime through a number of means, including such technology innovations as clustering and fault-tolerant hardware. With the uptake of virtualization and the strong demand for running live workloads in VMs, we are beginning to see such innovations migrate into software as well.

The latest releases of the major virtualization platforms indicate how fault tolerance is drawing increased focus. VMware, which has had a High Availability or "HA" feature for some time, in its vSphere product, the latest incarnation of the company's "Datacenter Operating System", has embedded a Fault Tolerance feature as well. Marathon Technologies, a Boston-based start-up, has also been an early innovator in software-based Fault Tolerance with its everRun VM product. Marathon has formed strategic alliances with both Microsoft and Citrix and provides the underlying technology of the FT solutions for both virtualization platforms.

I expect to see these solutions and others increase in prevalence as cloud computing continues to evolve and increasingly supports mission-critical workloads, particularly as we move beyond private deployments into public cloud infrastructures, where SLAs supporting near 100% availability will be increasingly commonplace (and critical).