SoundCloud has recently announced they have completed their 8-year-long migration journey using the Strangler pattern from a monolithic codebase to a fully-fledged Backend For Frontend (BFF), an architecture pattern pioneered by and at SoundCloud.
The announcement examines the SoundCloud team's steps to have a successful migration, the learnings from the migration journey, and the benefits and risks of the Strangler pattern.
The motivation for SoundCloud to adopt the Strangler pattern dates back to 2014, when they noticed their Rails application did not perform well when interacting with multiple microservices to serve user traffic. Therefore, the public API Strangler BFF was introduced, where it would intercept and augment the public API responses by calling additional services when necessary.
BFF Strangler pattern architecture diagram - source: BFF @ SoundCloud
This adoption of the Strangler pattern was more motivated by an immediate need rather than planning for a future free of the public API monolith. As a result, the Soundcloud team continued their microservice API development while leaving both the Strangler and the original monolith largely unmaintained. As a result, a slew of unwanted issues happened over time, code duplication, inconsistent API behaviors, and security risks. Motivated to address the situation, the SoundCloud team began the migration of the Strangler to a fully-fledged BFF in January 2020.
Evolution of BFF Public API - source: SoundCloud Developer Blog
In order to understand the scope of all the migration work, the SoundCloud team added telemetry to understand which endpoints were still in use. Then, explicitly declaring all known public API routes in the Strangler codebase. For any undeclared routes, a fallback to call the public API was added. Furthermore, the fallback was removed once the SoundCloud team was confident that they had identified all the routes. Any routes that were not in use and weren't documented on the developer portal are removed. Finally, knowing all the endpoints that need to be ported, each will be migrated to call the existing microservices instead of the public API.
The SoundCloud team deploys the ported implementation alongside the old code that proxies to the public API to reduce and avoid any unwanted breaking changes to the public API. Incoming requests execute both code paths — the old (using the proxy) and the new code. The response of the proxy's call to the public API gets returned to the caller. At the same time, the responses of the proxy and the new code are compared for consistency. If the responses of the old and new code don't match, a telemetry event is triggered and the difference is logged for inspection by the developer. The developer may then need to make some changes to the ported implementation until they are confident that the new code matches the original functionality. At this point, the proxy can be removed, and the ported response gets returned.
The Strangler is now a fully-fledged BFF, and the public API's entire codebase has been deleted. As a result, SoundCloud now has a codebase that most engineers can contribute to, that does not negatively impact project scope, that fits with our microservice architecture, and that helps ensure data consistency and security.
The Strangler pattern did come with significant risks. The SoundCloud team suffered from a long fallow period with very little maintenance and plans for the public API that caused an unhealthy codebase, increased security risks, and added complexity for feature development. Ultimately, when deciding whether or not to adopt the Strangler pattern, one should consider whether such disruptions to the business outweigh the ultimate benefits of the work.