Boxkite is an open source instrumentation library designed to track concept drift in highly available model servers. It integrates with DevOps tools such as Grafana, Prometheus, fluentd and kubeflow, scaling horizontally to multiple replicas without needing changes to code or infrastructure. The project claims to be fast, correct and simple.
The project aims to be used in production, unlike tools such as Evidently that run inside Jupyter Notebooks; and to be lighter weight than other production monitoring tools like Seldon.
InfoQ got together with Boxkite contributor Luke Marsden to talk about Boxkite:
InfoQ: Why create another ML monitoring tool?
Luke Marsden: We wanted to create a tool that worked in a lightweight way with the tools DevOps teams are already using in their cloud-native stacks. In particular, existing ML monitoring tools either don't run "in production" (i.e. they are libraries that you can run in a notebook, but they're not "online") or require heavyweight setup like a message queue system and a centralized service for performing drift detection. Boxkite in comparison simply pushes the work of doing online drift detection directly into Prometheus by encoding functions like K-L divergence as PromQL queries: it's therefore "online" -- it runs in production using the existing Prometheus+Grafana stack your team likely already has, and lightweight -- just add our python library to your training code and your inference servers -- and no other infrastructure software is required.
InfoQ: Where is ML maturity compared to software engineering?
Marsden: Decades behind. Looking at common practice in ML today is like looking at software engineering in the 90s. There's often no version control of all the components, no continuous delivery, and no monitoring of the behavior of models in production. Solving monitoring in a lightweight way is one part of solving that, in order to build MLOps stacks for Machine Learning that have the same ergonomics and governance that software teams expect from DevOps today.
InfoQ: How important is model observability compared to other problems in MLOps?
Marsden: In our experience we've found that the first problem teams new ML teams experience trying to get business value out of an ML model is just how to get that model into production at all. Oftentimes, that involves a confusing conversation between a researcher and a DevOps person who find there's an impedance mismatch between a Jupyter notebook and what DevOps people are familiar with deploying into production. After getting models deployed, understanding the behavior of a model in production is often problem #2. We heard one company say that they lost an immeasurable amount of money by having an unmonitored model go haywire in production. Only after deployment to production and model monitoring are solved, teams tend to find team and model-count scaling related problems like model management and provenance tracking become pinch points.
InfoQ: Why did you choose to open source the tool?
Marsden: Because the future for MLOps is open source, we believe it will follow a similar trajectory to the DevOps stack where Kubernetes and the CNCF stack with Prometheus, Grafana etc is widely becoming canonical.