At the recent QCon London conference, Hien Luu, senior engineering manager for the Machine Learning Platform at DoorDash, delivered a talk on "Strategies and Principles to Scale and Evolve MLOps at DoorDash." Luu shared insights on how to overcome the challenge of ML systems not adding value to production, which is a common issue faced by companies. In his talk, Luu outlined three key principles that have proven effective in addressing this challenge at DoorDash.
Luu started his talk by highlighting that according to Gartner, 85% of Machine Learning (ML) projects fail. This is largely caused by them not hitting production, and he emphasized that ML models have zero Return on Investment (ROI) until they are in production. MLOps should be seen as an engineering discipline, and adopting the right strategies is crucial for building a successful MLOps infrastructure from scratch.
The AI Infrastructure Alliance provides a helpful blueprint for successful MLOps based on four key factors: use cases, culture, technology, and people.
Focusing on the use case, or "the game you are playing," is essential for understanding the critical aspects of your ML project. Collaborating with stakeholders and decision-makers is crucial in this process. It might also involve dealing with issues such as fairness, biases, and explainability of the predictions. It is also important to understand the company culture. Companies may differ in their approach to projects, either being innovative, collaborative, results-driven, traditional, customer-focused, or inclusive. Identifying the expectations for progress and adapting to the company culture enables smoother implementation of MLOps strategies.
Naturally, the technology used plays a significant role in MLOps, and it is essential to identify any hidden tech debt surrounding the systems. Assessing the maturity level of all dependencies helps in making informed decisions. This should be combined with the previous point, where people and technology ideally are aligned to reach the desired impact. Involving stakeholders in the infrastructure planning and maintaining effective communication patterns ensures that the MLOps strategy aligns with the organization's needs and goals.
Hien Luu shared three core principles at DoorDash that have been instrumental in scaling and evolving MLOps, ensuring the success of ML systems in production. These principles are "Dream Big, Start Small," "1% Better Every Day," and "Customer Obsession." Each principle highlights a specific approach that has driven DoorDash's MLOps success.
1. Dream Big, Start Small: this principle emphasizes the need for a clear vision and ambitious goals while also focusing on progress and impact through incremental improvements. By starting small, companies can make steady progress and achieve their grand vision over time.
2. 1% Better Every Day: Luu shared a real-world example of how DoorDash has embraced this principle. They adopted Redis for feature storage and moved from storing each attribute separately to storing each piece of information as a JSON string, forming a whole profile. They developed a method to minimize the number of encoded bits in the key and value, resulting in reduced CPU time and memory usage. This led to a 3x cost reduction and a 38% latency reduction. Their experience is documented in a blog post titled "Building a Gigascale ML Feature Store with Redis". Constantly striving for small improvements each day can lead to significant overall enhancements in the MLOps infrastructure.
3. Customer Obsession: This principle stresses the importance of not only listening to your customers but also inventing on their behalf. DoorDash believes in delighting customers with "French fry moments," which refers to surprising customers with something they don't expect. By being genuinely obsessed with customer satisfaction, companies can create MLOps strategies and systems that truly cater to their users' needs and improve their overall experience.
During Luu's talk, attendees raised questions regarding the use of existing tooling and aligning stakeholders. Luu recommended considering existing solutions before building a custom tool, citing examples such as Triton or Bento. As for aligning stakeholders, Luu emphasized the importance of understanding their goals and the desired impact on the company.
In conclusion, Luu shared three principles at the QCon London conference to scale and evolve your MLOps projects. By following the three principles of "Dream Big, Start Small," "1% Better Every Day," and "Customer Obsession," as well as considering existing tools and aligning stakeholders, companies can significantly enhance the success of their ML systems in production.