Yesterday, Marcel Molina Jr. of 37signals (and member of the Rails core-team) announced the initial release of AWS::S3, a ruby library for Amazon's Simple Store Service's (S3) REST API. Marcel was nice enough to share insight into the motivations and history behind his promising new library with InfoQ. We think his answers cast light into how Amazon's web services are transforming the industry.
Even though AWS::S3 is currently just an initial release, you informed the community that "we are using it in production at 37signals." Why did 37signals decide to go the S3 route?
The main motivation is that the file servers that we currently have in our cluster are maxed out on the number of disks they can hold and we are running out of space on those disks so it was time to either add more file servers (which starts to get really expensive) or come up with something else. The technical barrier to switching parts of our infrastructure over to S3 were low enough that it made sense.
How are you using the S3 service now?
As for how we are using it, currently all uploads for Campfire are served up from S3. We are working on doing something similar with the other products. The approach is basically async migration which isn't that disruptive to introduce into the system. Files are uploaded as they always were, but there is a service that comes in every once in a while and migrates them over to S3 then marks them as migrated. When you make a request for a file we determine if it should be served off S3 yet. If so, you get it off S3, if not, it's comes from the local filesystem. The same service that migrates the data to S3 also purges files from our file server that have been migrated. How much file storage we need then becomes the amount of data that might be waiting to be migrated between runs of the service (it runs very frequently). So the cluster can't go diskless, but our data storage needs are almost nil.
Did Jeff Bezos' involvement have anything to do with 37Signals adoption of S3 or was it just a smart decision?
More than anything it was just a smart decision. Sure, Jeff Bezos thinks S3 is a good idea, of course, and that he's building a whole business around services like S3 is a testament to where he thinks the industry is going, but he doesn't mandate our technology or business decisions. We decided that from a technical as well as a business perspective that it was the solution to our file storage issues. Frankly, I don't think Jeff even knows we are using S3 at this point.
What are some of the challenges you have faced? Is performance an issue?
As for performance, the bottleneck (from profiling the code) is xml parsing. By default it uses XmlSimple which wraps REXML. This is nice from a portability and ease of installation point of view, but not ideal from a performance point of view. But, if you have libxml installed (the ruby bindings to the gnome xml library) aws/s3 will automatically use libxml instead of REXML for the parsing, which makes things an order of magnitude both faster and more efficient. Having said that, most people won't be using S3 in a way that would lead to them needing to parse huge amounts of xml. For every day use even the REXML version is fast enough. You only start running into performance problems when you, for example, ask for the list of all the objects in a bucket that contains hundreds of objects. You likely wouldn't be doing that very often, especially since you can limit the number of objects returned in a bucket by various filtering criteria. Aside from the xml parsing issue I find performance to be fine for my needs. Now that the library has been released, I'll likely find out from people using it if there are parts that really must be made more performant.
If you were asking how S3 (on the server side) performs, then my answer would be "just fine", though I'm not pushing it to its limits.
What's special about your S3 library compared to other options available in Ruby? Is your library easier to work with?
As for "easy to work with", that's more of a priority for me than most anything else. I've payed very close attention to thinking about the API and molding it to incrementally be as close to how I'd like the interface to S3 to be. There is of course still work to be done on this (always) but I'm pleased with where things are now as far as ease of use. I want the library to be a joy to use. That's why I use Ruby.