The recent QCon Plus online conference featured a panel discussion titled "ML in Production - What's Next?" Some key takeaways were that many ML projects fail in production because of poor engineering infrastructure and a lack of intra-disciplinary communication, and that both model explainability and ML for edge computing are important technologies that are still not mature.
The panel was moderated by Frank Greco, senior technology consultant, chairman at NYJavaSIG, and QCon Plus November 2021 committee member. Panelists included Chip Huyen, a machine-learning startup founder and Lecturer at Stanford; Shijing Fang, principal data scientist at Microsoft, and Vernon Germano, senior manager of machine learning engineering at Zillow. The panelists discussed questions posed by Greco and fielded a few more from the audience at the end of the session.
Greco began by asking panelists for their opinion on why many ML projects fail in production. Germano said in his experience many organizations will focus on the data science aspect of the problem, developing models that achieve good results on test data and evaluation metrics; however, getting the models into production presents many engineering problems that require expertise the data scientists don't have. Fang added that the problem is often cultural, with a lack of a "holistic view" of the business objective and failure to communicate across departments or disciplines within the company. Huyen noted a lack of systematic postmortems for these failed projects, making it hard to know what percentage was due to problems such as a lack of tooling.
Greco then asked about continuous delivery of models, comparing the current method to the Waterfall model of software delivery. Huyen suggested that delivering models should be thought of as part of the same process as delivering the rest of the software stack, and that companies needed to focus on tools and processes. Germano noted that producing ML models did require a slightly different approach and that there was a lack of off-the-shelf infrastructure needed for the long-term "care and feeding" of models in production; instead, companies must invest in building that infrastructure. He also stressed that continuously monitoring the performance of models in production was "one of the most critical things you can do." Fang added that for reinforcement learning in particular, the problem is even more complicated due to the need for real-time collection and distribution of data.
Huyen posed a follow-up question to the panel about tooling: given that there are many commercial and open-source products dedicated to ML tooling and solving problems of deploying and monitoring models, why are companies still struggling? Fang suggested that the problem wasn't a failure of tooling. Instead, the problem is one of integrating many systems as well as dealing with complex business problems.
The panelists then answered several questions from audience members. When asked about feature engineering, Fang discussed her team's use of a "feature bank," a centralized server that documents datasets and features that can be shared across multiple projects. When asked about whether using Jupyter notebooks in production is a good practice, Germano replied that it would depend on the context. He said he had seen it work in some cases, such as overnight batch processes, but it would probably not scale to support a website with millions of users.
Another audience member asked about model interpretability and explainability. Germano suggested that with very complex deep-learning models, explainability was difficult and teams may have to trust the evaluation metrics. Huyen pointed out that explainability and interpretability have many different use cases; for example, one is to help ensure that models are unbiased and fair, but another is for helping developers to troubleshoot model performance. She noted that when a model's performance in production drifts over time, there are many possible causes, and without understanding how the model arrives at an answer it can be impossible to determine the root cause.
Greco concluded the session by asking panelists their thoughts about ML on edge devices. Huyen called this the "Holy Grail," noting that the more computation businesses could push to consumer devices, the less they would have to pay for their own computing costs. However, she said there are many problems to be solved: under-powered edge hardware, managing multiple model versions, monitoring model performance, and pushing updates. Germano agreed, and also discussed the idea of end-users running models in their web browsers, which could save businesses on compute costs and give users a better experience.