Christopher Moyer has written a new book, “Building Applications in the Cloud: Concepts, Patterns, and Projects”. This book revolves around fundamental differences between the on-premise and cloud infrastructures, and architecture and design patterns that can be used to build and host scalable, reliable applications in the cloud.
InfoQ recently got in touch with the author about the content of the book.
InfoQ: Can you give a bit of a background about your experience with building cloud scale applications?
Chris Moyer: While working at RIT, I was approached by a colleague who was looking for someone to work for a friend at a newly created start-up company dealing with Amazon Web Services. Always eager to learn and explore new things, I accepted the offer and was introduced to Mitch Garnaat, creator of boto and advocate of AWS.
My first few weeks working for Mitch were spent building tools around boto to help manage Amazon Web Services such as Elastic Compute Cloud (EC2). We were focused on making sure that we could launch instances quickly and easily, bootstrapping and updating instances, keeping true to the idea that servers were "disposable" and should be easily created and destroyed automatically.
I soon moved on to consulting, where I created a high-scale application that used cloud computing services to accept video uploads, encode them into multiple formats, and even support other services such as creating transcripts.
Eventually I was hired on by Newstex, LLC, where I now work as the Vice President of Technology. I'm now in charge of multiple different cloud-based systems which span multiple regions. These systems range from a simple salesforce-like CRM to pulling in RSS feeds and providing categories and tickers based on AI systems.
The patterns and tips I've discovered and added to my book are all based on first-hand experience with simple solutions to complicated problems.
InfoQ: You have worked very closely with the boto project which you refer to for most of the examples in the book - can you explain in a few words the purpose of that library and how it is meant to help developers?
Chris Moyer: Amazon Web Services provides REST-like APIs to access all of its cloud services. These APIs have lots of complicated rules around them, so the boto project was created to make communicating with those APIs simple and easy in Python.
Boto also works with a few other cloud providers such as Eucalyptus and Google's Storage Service.
InfoQ: The book refers to several architectural patterns along with guidance on when to use them - is it possible for you to explain in short how and why these are very different compared to what we have in pre-cloud, in-house hosted applications?
Chris Moyer: The problem with traditional architectural patterns is that they rely on everything being in one place. Your storage, compute power, and databases are almost always right on the same machine. Even if it's not, you probably have a special set of everything just for your application, and everything most likely took you a long time to acquire the resources for.
Additionally, if a server goes down in your own data center, you're going to call someone up to fix it. In Cloud-based applications, you simply kill that server and launch a new one. Being able to just magically add new systems into your application adds so much power into your hands as an application developer, making you rely less and less on IT support staff.
The architectural patterns used in cloud-based environments are based on services, instead of individual machines. That's really the biggest difference.
InfoQ: The book mainly relies on AWS for discussing concepts and patterns. Even though much of these could apply to other clouds like Windows Azure or Google AppEngine, they would probably require some changes - what would be your advice to someone reading this book, looking to use one of the non-Amazon clouds?
Chris Moyer: The first few chapters in the "Concepts" section are very important no matter what cloud provider you're dealing with. After that I suggest reading through the patterns and working through the examples building the same applications and pieces created in the book, but using your preferred cloud provider. All cloud providers have lots of documentation on how to use their specific APIs and services, so keep those handy while going through the book and reading the patterns.
I also suggest even if you don't plan to use AWS for final deployment of your application, at least try some of the services while working through the patterns in the book. You can learn a lot about cloud development for under $1 as long as you remember to clean up your services once you're finished.
InfoQ: Are there any features that you think the cloud providers, especially Amazon but even others, are missing today, that you would really like to see?
Chris Moyer: Amazon has done very well with keeping up with features as they're requested. Their strong following in the open-source community and how well they listen to clients is why they're still number one in the cloud-computing space. Every time I think "hey it would be a great feature to have X", Amazon is quick to release that feature. There's really not much that I've wanted that they haven't created or are working on creating.
The only thing that's really missing is providers agreeing on standards for APIs for cloud services. Eucalyptus is very nice because they offer AWS-style APIs for everything, but this is something that other providers are really lacking. Rackspace has started offering up AWS-like APIs as well, but they also tried to develop all of their libraries themselves and don't really rely on open-source communities as much. The big drawdown of launching applications "in the cloud" is that people are afraid of "Vendor Lock-In". If we could all agree on standards for APIs and services, that wouldn't be any concern at all.
InfoQ: You have an entire chapter dedicated to creating Instance Images and various strategies around this. Could you explain why this is so important?
Chris Moyer: Machine Images are the core of every system built in the cloud. It includes the most basic fundamentals such as an Operating System and standard programs that your application requires. This essentially is the core of your entire application, and if it's not done right you'll waste more time fixing it and building new images than actually progressing on your application.
You can think of an Image like a Chassis on a car. If you don't put enough support and thought into it, you won't be able to add in all those nifty features you want later. Yes, you can expand your Chassis after you've already started building the car, but it's much more complicated and often involves tearing out parts that you've already put together.
Creating a solid foundation is always the first step in building a great application. There are many different ways to build this foundation, and Chapter 4 goes over the most common ones. It's very different from just building a single server and installing what you need on it.
InfoQ: Why is HTTP and REST important for a cloud developer?
Chris Moyer: HTTP and REST are both very well-known standards that are easy to implement as a developer on both the backend (which runs on the cloud) and frontend (which runs on the user's systems). Utilizing these two well-known and simple technologies, you can separate your services completely, opening you up to a wider range of clients. When you build your system to utilize this client-server interaction, you simplify adding support for new platforms.
Understanding these technologies is also very important since most Cloud Providers utilize these as well. Since you're already supporting using them for your cloud services, why not use the same technology in your own systems?
InfoQ: We have all read about how the chaos monkey helps organizations like netflix test their systems, which are built such that even if one service goes down, the other services continue to work. Could you briefly explain how we could put together some of the patterns explained in the book to design software systems to work like that?
Chris Moyer: Each pattern in this book is really designed to support this level of failure recovery by itself.
The best example is using the Queue pattern. Using this pattern, if you submit a process to be completed and your server working on the process dies for any reason (say, a chaos monkey throws a wrench in the system), the message will eventually just be picked up by a different server.
Another good example is using the Proxy/Balancer pattern. In this case if you're hosting a website for example, if you lose an entire fleet of servers, as long as one of them is up an running, end-users will still be directed to the correct place.
InfoQ: In the Projects section, you have walked-through building a weblog for the cloud - how is this different from a standard blogging software like WordPress?
Chris Moyer: Again, this is just a simple example that people should be able to relate to, and really quite overkill for most blogs. Standard services like WordPress rely on a single computer or "box" to host everything for them. If you move this to the cloud, you could mimic this, but a better solution is to split everything out so you can scale out by adding more machines to the mix. You can also host WordPress in a cloud-like fashion, but that wouldn't make for a very in-depth example.
InfoQ: You have also used Marajo in your second walkthrough - can you explain what this library does?
Chris Moyer: Marajo is designed to be a simple cloud-based application development framework. It's based off of Google's AppEngine framework and Django. Anyone who's familiar with a framework such as Ruby on Rails, Django, or Pylons should be able to use this framework to quickly whip up their own web-based application that utilizes cloud-based solutions such as SimpleDB.
InfoQ: One drawback seen in cloud computing providers is the bandwidth cost, which is makes it somewhat more expensive than traditional web hosts. Are there any scenarios where you think it may not make sense to move to the cloud just yet?
Chris Moyer: Sure, if you're only storing your own movies in S3, that will be a lot more expensive then just buying an extra hard drive, but if you need to scale your system to serve more then just your local network, that's where you'll get advantages by moving to the cloud.
The problems really come in when you hit enough usage that you need to provide support for your hardware. You may spend $1,000 on data transfer costs per month to store things in Amazon S3, but if you stored them locally you may have to spend 4 times that amount just to have on-call support staff. Netflix recently announced that it costs them roughly 5¢ per DVD to stream it over the internet, versus about 35¢ per DVD to be shipped.
Keep in mind that Cloud providers such as Amazon are always working on making things cheaper for their clients. For instance, Amazon recently dropped all inbound data transfer charges, which means you can now upload as much as you want to Amazon without incurring any charges there. You'll still pay for outbound costs, and storage, but they no longer charge for inbound data transfer.
About the Book Author
As Vice President of Technology for Newstex LLC, Chris Moyer helps developers migrate to the cloud or create cloud applications. He has been actively involved with developing boto, the main connection library for Python and Amazon Web Services. He has also created botoweb and Marajo, two leading open source cloud development frameworks. He blogs here.
He is author of the book, ‘Building Applications in the Cloud: Concepts, Patterns, and Projects’, published by Pearson/Addison-Wesley Professional, April 2011, ISBN 0321720202, Copyright 2011 Pearson Education, Inc. For more info please visit the publisher site.