Uber uses Presto, an open-source distributed SQL query engine, to provide analytics across several data sources, including Apache Hive, Apache Pinot, MySQL, and Apache Kafka. To improve its performance, Uber engineers explored the advantages of dealing with quick queries, a.k.a. express queries, in a specific way and found they could improve both Presto utilization and response times.
Express queries, i.e. queries that execute in under two minutes, amounted to about half the total queries Uber analytics process. Engineers at Uber found out that handling them just as the rest of queries caused underutilization and high latency due to them being throttled to avoid overloading the system.
To prevent express queries from being throttled, the first step was to predict when an incoming query is an express queries. Uber engineers based this prediction on historical data where each query was assigned an exact fingerprint, that is, a unique hash calculated after removing comments, whitespaces, and any literal values.
We tested the P90 and P95 query execution times using the exact fingerprint and abstract fingerprint of the query with lookback windows of 2 days, 5 days, and 7 days. [...] We used this candidate definition to predict if a query was express: if the X runtime of the query in the last Y days based on Z fingerprint was less than 2 minutes.
Using this definition, Uber engineers found that the P90 value of the abstract fingerprint with a 5-day lookback window provided the best accuracy and coverage to predict which queries would complete in less than two minutes. They also kept enough data in the table used to look up if an incoming query is an express query to be able to change to a different percentile and/or larger or smaller lookback window.
While this idea seemed straightforward, its design and implementation went through a couple of iterations, moving from the idea of introducing express-query handling as late as possible in the system pipeline to handling them as early as possible.
In their first design, Uber engineers decided to queue up express and non-express queries in the same queues based on their user-priority. That means, all queries, either express or non express, went to the queue corresponding to their user-priority, e.g., batch or interactive. Each queue had its own designated Presto cluster, with a subset of it dedicated to handle express queries after all user/source and cluster limits checks were carried through.
This design led to underutilizing the express Presto clusters, basically because of mixing slow and express queries in the same queues, which made slow queries still dictate the pace at which express queries reached express clusters.
As a second attempt at designing the system, Uber engineers tried to use a dedicated queue for express queries, which were routed as soon as entering the system after the validation step. So, express queries had a direct, faster path to the designated express clusters, bringing an order-of-magnitude improvement in end-to-end SLA for over 75% of scheduled queries.
While this approach improved cluster utilization and running time for express queries, Uber engineers are considering how to improve it even more. In fact, running this design in production showed that express query handling is so fast that it is not necessary to route express queries across different queues based on their priority.
This made Uber engineers start to work on a further evolution of their design were express queries have a completely dedicated subsystem, which promises to further increase cluster utilization and simplify routing logic.