Dan Luu published an article presenting Wave as a case study for a business model where a simple and boring architecture fits best. Instead of a state-of-the-art service-based asynchronous architecture, they employ a synchronous monolith backed by a database and serving a unified API.
(...) for most kinds of applications, even at top-100 site levels of traffic, computers are fast enough that high-traffic apps can be served with simple architectures, which can generally be created more cheaply and easily than complex architectures.
The author states that Wave's architecture is based on a Python monolith serving CRUD requests on top of a Postgres database.
Its server processes block while waiting for I/O operations, including network requests. The company experimented with asynchronous frameworks such as Eventlet but found that their immaturity caused significant operational overhead, eventually deciding against using them.
The cost of having CPU resources doing nothing but waiting is not negligible. However, at Wave's request volumes, the cost of the engineering team is much higher. In their business model and with their current traffic levels, the relatively low computational load per unit of revenue does not justify investing engineering time optimising the hardware usage costs.
Instead, the long-running tasks whose initiating requests do not need to return a response are sent to a queue. RabbitMQ supports this queue.
They chose Kubernetes for the platform level. The reasoning was that the company would expand into additional countries as the business grew, with different regulations, eventually requiring them to maintain their systems or databases locally. A monolith-based architecture makes it easier to split their backend as necessary to comply with local laws and regulations when compared to complex service-based architecture.
Wave adopted GraphQL for the API layer. The possibility of documenting and generating code with the exact return types led to safer clients. Composition capabilities in the query language allow all Wave applications to mostly share a single API, thus reducing complexity and allowing clients to avoid unnecessary network roundtrips by fetching only necessary data.
Application data is transported over HTTP/3. Its underlying QUIC protocol is a better match for operational constraints on the field, such as mobile data service unreliability and low bandwidth. Wave now only maintains a custom transport protocol on top of USSD for emergencies.
The author points out that choosing Kubernetes or GraphQL did bring additional complexity, but their advantages outweigh the disadvantages.
By keeping our application architecture as simple as possible, we can spend our complexity (and headcount) budget in places where there’s a complexity that benefits our business to take on.
When starting up the company, their initial preference was for buying over building software to save the then small engineering team's time. As vendors become unable to fix specific problems or provide a solution that fits their needs, taking on the extra complexity in-house may make financial and operational sense.
An example is integrating with telecom providers. Wave needs SMS services, but the major SMS SaaS provider doesn't operate in all their target countries, and the service cost would be prohibitive. In this situation, the author states that "the team that provides telecom integrations pays for itself many times over".
In hindsight, they would not adopt some choices made during the initial system design and building phases as quickly if they built a similar system today (e.g., using RabbitMQ or Python). However, the current operational downsides are not significant enough to justify migrating to a different technology.