Google has recently published a paper providing architectural guidelines for creating a scalable and resilient solution running on their cloud platform. This article digests the respective paper extracting the main ideas and advice. These guidelines can be used with minor changes for deploying web applications on other clouds.
The next diagram presents the main components of an application that is meant to be both scalable and resilient, followed by their description:
Region – while the application can be deployed in multiple regions, the paper discusses the case of a single region with multiple zones.
Zone – a location within a region. Network connections between zones have low latency and high capacity to properly handle communications between nodes.
Load Balancer – represents the entry point for a client, distributing requests evenly between instances. A load balancer could spread the requests across multiple regions if necessary.
Instance – a particular virtual machine running on GAE.
Instance Group – instances within a zone are grouped together and have a group manager which is responsible for creating or shutting down instances.
Autoscaler (not shown in the picture) – This component instructs the group manager to create or terminate instances based on existing load. To do its job, the Autoscaler uses a policy based on CPU utilization, a cloud monitoring metric, or requests per sec received by instances.
Cloud SQL – This SQL storage solution is managed at region’s level, being automatically replicated between zones.
Cloud Storage – This storage solution keeps objects (usually files) for all instances. This is where state data and file uploads should be kept.
The above mentioned components are necessary for a scalable solution. In order to be resilient, there is necessary to provide the means for starting or restarting and configuring instances when some of them are not responsive, using the following components:
Startup scripts – These are scripts residing on instances, cloud storage or are fetched from a specified URL, being executed during instance boot or restart, and are responsible for setting up the instance and making sure all local services are running. These scripts can install software or perform updates.
Health checks – This is a GAE component responsible for running checks on instances to see if they are alive and wealthy. Requests are sent only to healthy instances.
Backend services – This is a components associated with health checks and instance groups, intermediating requests coming from the load balancer.
The paper provides instructions on setting up and deploying Redmine, which is a Ruby on Rails project management application. A related GitHub project is set up for this purpose.
The paper also includes advice on how to calculate the costs associated with running a web application based on the average number of page views, requests, page size and other metrics.