Key Takeaways
- Invest in metrics - these are your customers
- The unknowns in data projects are different to those in traditional software engineering projects and so customers and sponsors need to learn how to understand progress and expectations
- Plan for mitigation - the only certain thing about data is that it will contain errors, so design for data mitigation from the start
- A data engineer or scientist needs to be an expert in communicating with data - invest in this skill
- Wallow in data - present results and discuss insights with your peers to make informed and balanced decisions for the team
The book Agile Machine Learning by Eric Carter and Matthew Hurst describes how the guiding principles of the Agile Manifesto have been used by machine learning teams in data projects. It explores how to apply agile practices for dealing with the unknowns of data and inferencing systems, using metrics as the customer.
InfoQ readers can download an extract of Agile Machine Learning - Chapter reprinted with permission from Apress (2020) an imprint of Springer Nature.
InfoQ interviewed Matthew Hurst about using agile for a data engineering team, rebuilding the data catalog every day, continuous integration and deployment of data changes, the benefits of rewriting software, doing sprint demo meetings, what can be done to break the pace for teams that are working at a sustainable pace, and technical excellence in data projects.
InfoQ: Why did you write this book?
Matthew Hurst: Eric and I have worked on a number of projects involving large amounts of data combined with the production and maintenance of machine learned models and other statistical inference systems. We observed that the application of agile processes, including Scrum, requires some specific approaches when dealing with the unknowns that data and inferencing systems bring. Our idea was to share real examples of applying agile to data projects and reflect on the benefits and challenges. With data projects, there is far less control over unknowns, and so new behaviours and attitudes are needed.
InfoQ: For whom is this book intended?
Hurst: This book should provide useful information for both managers of data engineering teams as well as individuals working on those teams. It should also provide insights to anyone sponsoring projects in this area so that they have a good understanding of how expectations are set and how progress is made.
InfoQ: What are the main differences when it comes to using agile for a data engineering team?
Hurst: I think that the key difference is the idea of the customer. We developed the idea that the metric is the customer. In traditional applications, the agile and Scrum approaches advocate the presence of a customer to help evaluate completed work and determine priorities for next steps. With data projects - especially those which continuously output a data product - having metrics in place to fill this role is essential. The range of metrics helps elaborate on the general idea of the customer wanting to "improve" the quality of the data.
With this in mind, projects of a certain scale should consider investing in a metrics team. This team needs to handle the collection of data, the (human) annotation of that data as well as the regular computation of metrics. We found that while it is useful to design a single metric that summarizes the quality and progress of the product, this is often of less value within the team and so we design a number of metrics as required by the components of the products and the dimensions that we wish to optimize for.
We also innovated around the idea of a data wallow. This is a semi-structured meeting in which an engineer presents some data and the team as a whole provides feedback and insights into the characteristics of the data and the performance of any inference being made. In a sense, this is similar to the more traditional reviews of code or user experience and design.
For example, when we wanted to replace an existing address extractor for a new one, we would prepare and present an analysis of the differences between the two implementations. A difference analysis allows the team to focus on the outputs that differ to see where the tradeoff is. A difference in output might be between an incorrect prediction in the original component and a correct prediction in the new component - this is an improvement. However, it might also contain examples where the difference is from a correct prediction to an incorrect prediction. A simple result is the net gain - was it positive or negative? However, only in reviewing the analysis with the team can we ensure that there are no overlooked, high priority changes that may lead to a decision not to ship the improvement.
InfoQ: In the book, you mentioned that the data catalog is created from scratch everyday. What are the advantages and disadvantages of this approach?
Hurst: By rebuilding the data catalogue every day, the system and the consumer are required to ensure continuity between versions of the data. This means that the inferences involved in producing the data are deterministic. This is in contrast to systems that incrementally curate data. Such systems can be influenced by the arbitrary sequence of the application of data and end up in a state which could not be recreated if the system were rerun from scratch. In addition, in general, the less state a system has, the better.
InfoQ: What techniques do you use for continuous integration and deployment of data changes?
Hurst: One of the key considerations for data producing projects is the mechanisms and workflows for handling mitigation. Given that catalogue production can be a lengthy process, it is important to be able to mitigate data errors on the live site in a way that is immediate, but which can be worked back into the core catalogue production system. Updating the data in the production data set is relatively straightforward - it is effectively a patch on the index. Ensuring that this patch is maintained correctly in catalogue creation is far more challenging. You want the correction to persist beyond the immediate mitigation, but in addition, you don’t want to keep it indefinitely as it will eventually become incorrect and likely the underlying source data will have caught up with the error and be capable of fixing it organically.
For example, in a system that takes several hours to compute the data catalogue of local businesses, we don’t want to wait for the next run before addressing an error reported by a user, or worse, the owner of the business. Consequently, we implemented a mechanism that would allow us to override specific attributes of the business (e.g. the phone number or address).
Design for mitigation became one of our team principles.
InfoQ: Why do you rewrite software frequently? What benefits does this bring?
Hurst: Software is a far more fluid engineering discipline than physical and mechanical engineering. The unknowns involved in the solution require that compromises be made when meeting deadlines. This is the steady state. Consequently, teams are always planning for minimally viable systems and aspiring for additional functionality above and beyond that. In many cases, the information about how to solve a problem, or what problem to solve, can only be gathered through the analysis of users’ interactions. That means putting the system out there and ensuring that you are capable of learning and updating in relatively small increments.
At a larger scale, no architecture can possibly be optimal for more than a few years due to the rapid developments in infrastructure platforms as well as advances in algorithms. Therefore, it is important to be able to re-architect with reasonable frequency as well.
To develop a team that is capable of re-writing, it is important to ensure that you have a codebase that is capable of such changes. This means paying great attention to modularity. Ideally, the units of computation will be well contained with clear abstractions. This will allow them to be re-deployed in different contexts - for example going from a database architecture to a map-reduce architecture to a micro-services architecture.
The frequency of re-design is often dictated by a mixture of technical needs and business needs. Teams will develop a dynamic between those protecting developer productivity (engineer managers) and those pushing on the product’s capabilities and ability to address user needs (product managers).
InfoQ: How do you do sprint demo meetings?
Hurst: With a larger organization of five or so teams, sprint demos need to be well-structured. We found that allocating 10-15 minutes per team provided a forcing function for each team to think about what they would present and ensure that it was presented in a clear and efficient manner to their peer teams. These meetings were continuous learning opportunities for all and a great opportunity for individuals to gain visibility in the organization.
Demoing progress in data projects can be quite different from the functional demos found in traditional project teams. In the latter case, the demo is of a user experience where new capabilities are shown. With data, it is often progress on a new metric goal that is demonstrated. This requires that those sharing the work be capable of presenting statistics and data graphics in an efficient and meaningful manner.
InfoQ: What can be done to break the pace for teams that are working at a sustainable pace?
Hurst: There is something a little sinister about the term "sprint". Metaphorically it is gruelling to be continuously sprinting! We found that inserting a week or two here and there allowing teams to regroup, experiment, hack and reduce the pressure was essential to the health of the team. In the course of a year, there are always natural points for this to happen - holiday seasons, after major milestones, etc.
InfoQ: What does technical excellence look like in data projects?
Hurst: Technical excellence is certainly required at the engineering level. Practices used to ensure that production code is in good condition (such as code reviews, unit testing, integration testing, etc.) m ust also be considered for code that is delivering any sort of data or metrics used for planning and decision-making. Bugs in metrics code, for example, will lead to incorrect statistics being reported and incorrect decisions being made.
In addition to this, it is important that teams have a good level of data literacy so that experiments used to motivate priorities and investments are meaningful. I would add that excellence in data science covers a commitment to excelling in communicating with data. There are many tools out there that encourage bad practices in data graphics and data visualization, and great data engineers are passionate enough about their area to learn how to communicate well with data. Some of the basics include: ensuring that all graphs are properly labeled, understanding the difference between trends (behaviour over time) versus individual dips and rises, understanding if a difference in a metric is meaningful in terms of the error of the metric itself, etc.
About the Book Authors
Eric Carter has worked as a partner group engineering manager on the Bing and Cortana teams at Microsoft. In these roles he worked on search features around products and reviews, business listings, email, and calendar. He currently works on the Microsoft Whiteboard product.
Matthew Hurst is a principal engineering manager and applied scientist currently working in the Machine Teaching group at Microsoft. He has worked in a number of teams in Microsoft including Bing Document Understanding, Local Search and in various innovation teams.