While there are many mature software patterns for applications, not the same can be said about clouds. Each vendor employs their own solution, which is most probably subject to change and improvement. The technology is not mature enough for a clear set of patterns to emerge yet, but the first working examples are out there.
Amazon suggests using their cloud for the following tasks:
Processing Pipelines
- Document processing pipelines – convert hundreds of thousands of documents from Microsoft Word to PDF, OCR millions of pages/images into raw searchable text
- Image processing pipelines – create thumbnails or low resolution variants of an image, resize millions of images
- Video transcoding pipelines – transcode AVI to MPEG movies
- Indexing – create an index of web crawl data
- Data mining – perform search over millions of records
Batch Processing Systems
- Back-office applications (in financial, insurance or retail sectors)
- Log analysis – analyze and generate daily/weekly reports
- Nightly builds – perform nightly automated builds of source code repository every night in parallel
- Automated Unit Testing and Deployment Testing – Test and deploy and perform automated unit testing (functional, load, quality) on different deployment configurations every night
Websites
- Websites that “sleep” at night and auto-scale during the day
- Instant Websites – websites for conferences or events (Super Bowl, sports tournaments)
- Promotion websites
- Seasonal websites - websites that only run during the tax season or the holiday season (“Black Friday” or Christmas)
An example of a cloud architecture is Amazon’s GrepTheWeb:
After zooming in, the architecture looks like this:
Jinesh Varia, a Web Services Evangelist at Amazon, explained GrepTheWeb in detail through a presentation published by InfoQ.
Todd Hoff compiled a list of basic components employed by SmugMug in their cloud architecture, which is also built on Amazon EC2:
- Work Initiators - Work comes in from your website and/or other software subsystems and is queued up for processing in the Queue Service. Work doesn't have to be large requests either. Work can be small independent parts of an overall pipeline. Don't keep state in the Workers. Bundle what you need done into a work request in shoot back into the Queuing Service for processing.
- Provisioning Service - This is Amazon's infrastructure that allows instances to be automatically scaled up and down in relation to the work load. This will be the major difference between your VPS or typical datacenter setup. There's an API for starting and stopping AMIs and mechanisms for automatically configuring and running VMs.
- Workers - These are the guys that continually pull work off queues and do something interesting with it. For SmugMug the results are stored on S3 but the results could be put in your own database, SimpleDB or whatever.
- Queuing Service - This is where work is queued for consumption by the workers. SmugMug built their own queuing service, but you could just as easily use Amazon's own SQS. Creating a scalable, distributed, performant, highly available queue service is not easy, so you may want to take a look at a number of different queue product suggestions in Flickr - Do the Essential Work Up-front and Queue the Rest.
- Controller - This component monitors many variables related to the work flow and decides how many instances of EC2 are necessary based on optimizing a small set of goals. Instances are add and removed as needed.
Each vendor has their own solution and different ones are expected to emerge in the future. The clouds have not been fully explored and slowly, but steadily, their architectural solutions are being elaborated.