DoorDash created a configuration management platform to help its logistics team maintain the growing number of business preferences and configuration values. The company used CockroachDB for persistence and simplified the architecture compared with the previous solution. The new platform enables experimentation, improves configuration value lifecycle, and provides flexibility and extendibility.
The logistics team at DoorDash is responsible for a wide range of functional areas and manages many configurations for tuning various business aspects of the platform across different verticals and retail locations. Historically, updating configuration values required committing these changes into source code repositories and relied on a complex architecture to propagate changes to many services within DoorDash’s vast microservices platform.
The Previous Architecture of the Configuration Management Solution (Source: DoorDash Engineering Blog)
The existing solution had many challenges and limitations, which mainly originated from the need to update huge files, often with single-line changes. This frequently resulted in unintentional updates, difficulties with auditing and versioning, and a time-consuming process to add new preferences. Additionally, due to a complex flow for propagating configuration values, the team struggled to definitively assert the time and the sources of configuration changes during the assignment process.
Saurabh Gupta and Reid Arwood, software engineers at DoorDash, explain why the company had to replace the existing configuration management solution:
Although that process started with a limited set of configurations, the old system struggled to keep up with DoorDash’s growth across new verticals. [...] It doesn’t help that these preferences aren’t kept in standard formats; some are JSONs, others are CSVs, and still others have no format at all. Maintaining these files and updating them at the speed of DoorDash’s growth has been both challenging and risky, with multiple outages occurring because of incorrect configurations.
The engineers created a new architecture with a more flexible data model for the data stored in a CockroachDB. It supports versioning and auditing configuration values at many levels of granularity (store, business, etc.). Configuration values are updated and approved using an admin UI application. The new solution supports expiration-based configurations that auto-revert, time-based preferences applicable at specific times of day, ephemeral config types to aid experimentation, automated validation, and auto-approval/rejection, etc.
The New Architecture of the Configuration Management Solution (Source: DoorDash Engineering Blog)
The team used an incremental approach toward delivering the new architecture. It optimized the latency of retrieving config values to around ten milliseconds (99th percentile) by upgrading the instance type used for CockroachDB cluster nodes and improving the application code, including query and thread-pool optimization.
As more use cases migrate to the new architecture, developers plan to scale the solution to handle 10,000 requests per second, improve its reliability and processing of large files, refine the data model to support more flexibility, and make it more developer-friendly.
InfoQ recently reported on other DoorDash initiatives to improve its microservices platform. These include using the service mesh and cell-based architecture to reduce infrastructure costs and rearchitecting the caching layer to improve performance and scalability.