Using your own public API can be a challenge, Phil Calcado, Director of Engineering at Soundcloud, declared when sharing his experiences managing and rebuilding a large Rails application at the recent GOTO Berlin Conference.
Soundcloud grows fast and in his talk Phil focused on some of the problems they experienced when creating a new website released about one year ago.
Soundcloud started as a Ruby on Rails application, extended during six years and getting much cluttered. A lot of the problems were due to the infrastructure, a large Rails monolith using MySQL and Memcached.
2010 they started to think about a new platform. One of the ideas came from Twitter when their redesign of the architecture was released, and that convinced the Soundcloud team to build a similar application; a single page JavaScript application talking to the backend using a public API. Eventually they started to build a new website and for 6-7 months a lot of very skilled frontend JavaScript developers, but lacking some backend experience, came up with a new web site built on top of the present public API.
They had a rather stable application but just before release they had a talk with Twitter and mentioned they were a releasing a new website inspired by the new Twitter architecture, using the exact same ideas. On that the people from Twitter answered they had found that the new design wasn’t such a good idea. Actually, later on Twitter decided to move back to server-side rendering for most parts.
This put the team in an interesting situation, it was an important integration so what should they do, ship or cancel? In the end they decided to ship. Knowing Rails they were convinced that Rails would be the first thing to break so they provisioned a lot of nodes. But with the new design they went from 3 requests to over 100 for one page and the first thing that broke was their high availability proxy. With that fixed, memcached broke, and finally Rails and MySQL. They now realized they were having a fundamental architecture problem.
One thing they realized early on was that they can’t rewrite the whole application. Staying with Rails, what they needed was a fast API, one that could handle a high number of requests as fast as possible. They divided the large Rails application into smaller parts and introduced service thinking. A surprise though was that they still had the same overall performance but with a performance bottleneck that moved from the database to HTTP. The conclusion was that they needed faster Rails.
Looking into the code they could see a lot of room for concurrency. Rails don’t like parallelism or concurrency so they tried to get asynchronous using tools like Finagle and managed to get both parallelism and concurrency. That reduced the load significantly and made results return much faster.
Since they now could serve requests much faster they looked into the network. Every page still made a lot of requests and in looking for ways to reduce this number they decided to go for a custom API where one request may return data for several pages. Doing this they ended up with three dedicated APIs, for mobiles, desktops and partners.
The most interesting design challenge they now have is how to model their API. Right now the developers prefer a more coarse-grained API for mobiles and a more experience based API for desktops, currently with two separate backends.
The GOTO Berlin Conference 2013 is the first GOTO conference in Berlin, with about 420 attendees and 80 speakers.