Airbnb has successfully migrated much of its API to GraphQL, resulting in improved page load times and a more intuitive user experience. In a presentation at GraphQL Summit, Brie Bunge described the multi-stage migration process that has been used across many teams at Airbnb.
As each stage was completed, Airbnb ended up with an Apollo and GraphQL app that is 100% type-safe, with no over-fetching, and they kept the site up and running throughout each stage of the migration journey. Adopting Apollo and GraphQL have created a foundation that allows Airbnb to experiment with new performance improvements that would not have been possible with a traditional REST-based architecture.
Before beginning the migration process, two prerequisites must be addressed. First, GraphQL must be set up on the backend. Adam Neary has written about how Airbnb got started with GraphQL.
The second prerequisite is adopting TypeScript on the front-end. At Airbnb, TypeScript and type safety resulted in faster development and teams have more confidence in what they're building. TypeScript types can be generated directly from the schema (using apollo client:codegen --target=typescript), and these types create a single source-of-truth between the backend and the front-end.
TypeScript is now the official language for front-end development at Airbnb, and half of their 3-million-line codebase has been migrated to TypeScript. Bunge previously gave a talk at JSConf on Airbnb's adoption of TypeScript.
Bunge said there are three major options for a conversion to GraphQL. First, and most tempting, is a complete rewrite from scratch. While this has happened in some cases as part of other major projects, it is not usually a viable option. A second option, to stop and refactor, works well on small or solo teams, but is difficult with many developers working together.
The migration option Airbnb recommends is incremental adoption, which has been the safest and most feasible approach, especially with a large team and large, pre-existing code base. The process Bunge described consists of five stages, with a goal at each stage to have a shippable, fully-functional, regression-free version of the app.
The first stage changes where data comes from, not how it is used. With a GraphQL endpoint and TypeScript in place, a REST request can be swapped with a GraphQL request. The goal of this first stage is to verify the front-end to backend integration is working and the TypeScript type generation is working properly. No changes are made to React components or the shape of the API response. This requires a GraphQL query and mutation that matches the REST endpoint.
Two GraphQL features Airbnb relied upon during this early stage were aliasing and adapters. Aliasing allowed mapping between camel-case properties returned from GraphQL and snake-case properties of the old REST endpoint. Adapters were used to convert a GraphQL response so that it could be recursively diffed with a REST response, and ensure GraphQL was returning the same data as before. These adapters would later be removed, but they were critical for meeting the parity goals of the first stage.
Stage two focuses on propagating types throughout the code, which increases confidence during later stages. At this point, no runtime behavior should be affected.
The third stage improves the use of Apollo. Earlier stages directly used the Apollo Client, which fired Redux Actions, and components used the Redux store. Refactoring the app using React Hooks (@apollo/react-hooks) allows use of the Apollo cache instead of the Redux store.
A major benefit of GraphQL is reducing over-fetching. The first stage, with the mega query, retained all the over-fetching behavior of the old REST endpoints, but the fourth stage is able to address this by introducing more granular query fragments.
At first, only the root of the app knows about GraphQL, and must fetch all the data that could possibly be needed by any component in the tree. The improvement process begins at the lowest leaf node of the component tree, by creating a GraphQL query for only the data needed by that sub-component. TypeScript is helpful because it will throw a compiler warning if a needed field does not exist. The parent component is then modified to fetch the data based on the fragments of its child components. A "rinse-and-repeat" process goes across all the leaf nodes, then up the tree to the app root, where the old mega query has been replaced by all the combined fragments. Because all the fragments are only requesting the data they need, over-fetching is eliminated.
The fifth and final stage addresses stage management. Once all components have been migrated to Apollo, Airbnb can leverage Apollo for API data and React local state or context for client data. This provides a consistent mental model for handling client data, and improves on the fragmented model that combined elements of React, Apollo, and Redux. It also eliminated a lot of boilerplate code necessary for Redux, and handled caching more effectively than hand-rolled solutions.
With GraphQL in place, Airbnb can now experiment with new opportunities. The first situation Bunge described was service worker query pre-fetching, to kick off the GraphQL query as early as possible so users see the page rendered with data sooner.
Without service workers, the page is rendered server-side, and there is a long gap while the user waits for that rendering to complete and the full page to be delivered. Service workers allow an application shell to show up right away, with a loading stage where the page starts to fill in as data is returned. The page also loads faster because much of the JavaScript is cached. Service workers can then provide further improvements by switching from component-level queries to route-level queries. Based on device limitations, this could lead to an additional 23-50% reduction in total time before the page is fully interactive.
A data-centric, unified schema is another project currently being pursued. The current schema is aligned to Airbnb's service-oriented architecture. Because services can combine the same underlying data in different ways, data is often duplicated. By switching to a data-centric schema, and adding a data hydration layer to the architecture, duplicate data requests can be avoided, duplicate code can be removed, responses are more efficient, and caching can be improved. This work is currently in beta, but looks promising, and more will be announced in the coming year.