Uber has made significant improvements to its MySQL database fleet by upgrading from version 5.7 to 8.0. The team wanted to take advantage of performance and concurrency improvements in newer versions of MySQL, and because MySQL 5.7 was reaching end-of-life in October 2023. The work took over a year and involved upgrading more than 2,100 clusters and 16,000 nodes across 19 production zones in three regions.
In Uber's blog post, Siddharth Singh et al explained that MySQL infrastructure contains multiple petabytes of data and processes over 3 million queries per second. So, minimising disruption during the upgrades was essential. To help with this, Uber's engineering team automated a system to guide each cluster through a multiple-stage upgrade process, which didn't need manual intervention. Singh continues, explaining how they opted for a side-by-side upgrade strategy rather than an in-place upgrade. This decision was driven by their need to minimise downtime, reduce risk, and allow for better testing.
The side-by-side upgrade process involved several stages. First, Uber added a corresponding MySQL 8.0 replica node in the same region or zone for each MySQL 5.7 node in a cluster. A week-long monitoring period followed to observe system performance and detect any issues. When this "soak period" was finished, engineers diverted traffic from the 5.7 replica nodes and promoted a MySQL 8.0 node to primary status for each cluster. Finally, all the MySQL 5.7 nodes were removed from the cluster, completing the transition to version 8.0.
System stability and data integrity were the prime concerns during the upgrades. The team's plan involved rolling back immediately to MySQL 5.7 if any service degradation was detected until they were confident enough to promote a MySQL 8.0 node to primary status. After that, a rollback to 5.7 would not be supported and thus needed to be avoided.
The upgrade had its challenges. After the upgrade to MySQL 8.0, some clusters' query execution plans changed, causing higher latency and resource consumption. Collaborating with database software company Percona, Uber's team identified and implemented fixes for affected clusters. Other issues included:
- Some incompatible queries and configurations
- Changes in the default character set and collation settings
- A need to upgrade client libraries for compatibility with MySQL 8.0
Uber is not alone in sharing the story of how best to invoke this significant upgrade at a huge scale. As covered by InfoQ earlier this year, GitHub performed a similar upgrade from 5.7 to 8.0, with some of the learnings also shared. Amongst many findings from the upgrade, GitHub also came across problems such as backwards replication from 8.0 to 5.7 breaking when used by clients using different frameworks and languages, thus shortening the potential rollback window. They also triggered replication bugs and encountered crashes when running queries with large WHERE IN values.
Percona's Przemysław Malkowski also has a detailed post on how to avoid disaster with a MySQL upgrade to 8.0, highlighting that it is tough to test write-related workloads before upgrading and emphasising the possibilities for data loss, slow queries, potential downtime, and client application incompatibilities. Malkowski also explains how to downgrade should it be necessary.
Other organisations have benefitted from upgrading to MySQL 8.0. Writing in a Medium article, Jyoti Ranjan Parida from Arzooo explained some of the benefits of upgrading:
- New features such as window functions, common table expressions and JSON enhancements
- Better performance, with improved index and query execution
- Better password expiration policies
- Improved resource management and supporting more efficient queries
- Group multi-source replication
Parida also explains some of the things to watch out for when upgrading, with a comprehensive list of pre-requisite actions such as:
- Ensure data types and functions aren't obsolete
- Remove orphan files and invalid triggers
- Avoid unsupported partitioned tables
- Resolve keyword violations
- Eliminate naming conflicts with new data dictionary
- Update obsolete SQL modes
- Check ENUM/SET column lengths
- Relocate table partitions from shared tablespaces
- Revise older GROUP BY clauses using ASC or DESC
- Shorten long foreign key names
- Consider converting utf8mb3 to utf8mb4 for better Unicode support
Despite the obstacles they encountered, Uber's upgrade yielded significant performance improvements. Server-side benchmarks showed a 29% latency improvement for inserts, a 33% improvement for reads, and a 47% improvement for updates, all at high concurrency levels. On the client side, some queries were 78% faster, and overall database lock time was reduced by 94%.
Through careful consideration of the benefits and challenges, we successfully navigated the transition, mitigating risks and minimizing disruptions to our services.
- Siddharth Singh, Sriram Rao Udupi, Raja Sriram Ganesan, and Debadarsini Nayak (Uber)