Microsoft has released an updated checklist for high-availability (HA) scenarios in Microsoft Azure with guidance on how to design and manage solutions that use VMs, websites and storage, in cases where great load is expected intermittently or consistently.
Using these guidelines, architects and developers can compose checklists based on the resources they intend to use.
At a glance: Recommendations for building more scalable applications
Microsoft provides many resources in Azure that can be composed into solutions, but this HA checklist targets Virtual Machines, Web Sites and Databases.
For Virtual Machines, these are the recommendations:
- Use Traffic Manager to handle balancing traffic across multiple Azure regions. Azure has 26 regions around the world, with 8 more expected in the near future.
- Use multiple VMs per role.
- Use a Load Balancer. Azure’s Load Balancers enable intra-data center routing. Thus, in a given region, it ensures the actual VMs being used are efficiently selected. More on the distinction between these assets is provided by Microsoft program manager Jonathan Tuliani.
- Deploy virtual machine scale sets for automatic growth or shrinkage in the number of VMs in operation, based on internal metrics such as the processor activity on VMs or external factors like the length of an Azure storage queue.
The checklist really provides a block by block approach to understanding what a HA-configured deployment of VMs in Azure looks like.
For Website services delivered via Azure, it is recommended that developers take advantage of the Azure CDN as well as stand up a caching provider such as a Redis cache; whereas databases running in production on Azure should utilize active geo-replication.
In making the announcement, Adam Glick, senior program manager - Azure resiliency, stated that even though Azure has provided offerings that support building with HA, many clients still don’t take advantage of those solutions. Microsoft notes that the solution approaches presented can be applied without significant changes to application design.
This checklist goes beyond just making recommendations and providing insight on some of the risks that are reduced by following the practices specified. With the average cost of downtime for applications being between $1.25 to $2.5 billion annually, according to the IDC, the risks are very real. They include:
- Increased latency due to bottlenecking
- Single points of failure across application tiers
- Reduced scalability
- Increased cost per resource used
The HA checklist is one resource among many that is provided by Azure on the topic. SQL Server gets its own list of considerations in the high-availability/disaster-recovery solutions document. For Azure web apps, Microsoft provides reference architecture. Technical guidance on resiliency and deployment models for achieving HA with IoT is also available. Glick stated in the announcement about the checklist that these steps should be implementable without having to cause significant application re-write, whereas the guidance shared by teams like the patterns and practices team represents insights that should be followed before solutions are developed.
High availability is a serious matter. Breaching service level agreements can result in significant legal action. When Sears had two outages in 2015, it claimed it lost $2M in profit and filed a lawsuit against the companies involved. Machine Zone, maker of the popular Game of War app dropped Peak Hosting as a provider after the app went down for two hours in October 2015. This led to the provider filing for bankruptcy.
Beyond the checklists, Azure, as well as other cloud providers supply templates that can deploy cloud resources containing pre-built HA configurations. Azure provides Azure Resource Manager templates that demonstrate load and clustering strategies. Amazon provides lists in this regard, too. Further demonstration of HA solutions can be found in IBM’s own deployment templates.