Day Two of the 18th annual QCon San Francisco conference was held on November 19th, 2024, at the Hyatt Regency in San Francisco, California. Key takeaways included: the hidden social drivers behind high-performing engineering teams; challenges in refactoring stubborn legacy systems; improved incremental processing with Netflix Maestro and Apache Iceberg; and lessons learned in building LLM platforms.
What follows is a summary of the keynote address and highlighted presentations.
Keynote Address: the (Not So) Hidden Social Drivers behind the Highest Performing Engineering Teams
Lizzie Matusov, co-founder and CEO at Quotient and Official Member Forbes Technology Council, presented her keynote address entitled, The (Not So) Hidden Social Drivers behind the Highest Performing Engineering Teams. Matusov kicked off her presentation with a quick journey into her experiences of software teams at Red Hat and discovered that social drivers are key to successful teams. Matusov maintained:
Without knowing the social drivers that impact a team, we don't know how our team will perform under different circumstances.
Data has shown how high performing teams demonstrating speed and quality can have a low rate psychological safety and high rate burnout.
Matusov showed a cartoon of a development team where one of the team members discovered a bug right before product launch.
The TAPPs Framework includes: trust; autonomy; purpose; and psychological safety. The team member who discovered the bug may experience low or high degrees of these attributes.
Trust is the belief among team members that each person will reliably contribute, communicate openly, and support the shared objectives. Trust unlocks open communication, faster problem solving and less rework.
Autonomy is the ability of software engineers and teams to make decisions independently regarding their work. Autonomy empowers engineers to make decisions and solve problems faster.
Purpose is a clear, shared understanding of why the team's work matters and how it aligns with broader organizational goals. According to the 2024 DORA Report, with higher user-centricity, delivery throughput does not correlate with product performance. The report explains the reasoning as:
Suddenly, work has meaning...there is no longer a disconnect between the software that is developed and the world in which it lives.
Psychological Safety is a shared belief among teammates that they can take interpersonal risks - such as speaking up, asking questions, or admitting mistakes - without the fear of negative consequences. She was also keen to ensure the audience was aware of what psychological safety is not as described by Dr. Amy Edmonson, Novartis professor of leadership and management at the Harvard Business School:
The term implies to people a sense of coziness...that we're all going to be nice to each other. That's not what it's really about. What it's about is candor; what it's about is being direct, taking risks, being willing to say, I screwed up.
The team member who discovered the bug may experience low psychological safety or high psychological safety in this scenario as shown in the graph below.
Psychological safety enables risk-taking, honest communication, and greater innovation.
Measurement (Who and How) involves who is involved in the data. Software engineers generally contribute to the data, but managers and executives view and analyze the data. Matusov maintains that software engineers should also view and analyze the data. To contribute to data, teams must trust the data. In short, building alignment builds results.
Matusov closed with these key takeaways:
- To capture the top social drivers behind engineering team performance, use the TAPPs framework.
- Autonomy empowers engineers to make decisions and solve problems faster.
- Purpose aligns engineers' work to the customers they serve.
- Psychological Safety enable risk-taking, honest communication, and greater innovation.
- For best results, engineers should have visibility into data collection and analysis.
- The best way to measure social drivers is through anonymous, aggregated surveys.
- To get started today, teams should: build a consistent process; review data regularly; and drive actions and improvements.
When we know the top social drivers that impact a team, we can create a happier, higher-performing engineering culture.
Highlighted Presentations: Refactoring Legacy Code | Efficient Incremental Processing | A Framework for Building Micro Metrics for LLM System Evaluation
Refactoring Stubborn, Legacy Codebases was presented by Jake Zimmerman, technical lead of Sorbet at Stripe, and Getty Ritter, Ruby infrastructure engineer at Stripe. Zimmerman kicked off the presentation on common complaints about stubborn codebases. Zimmer maintained that we can refactor to a happy state. The best way to centralize a refactoring involves having one team drive it in such a way to concentrate on expertise, incentivize automation and a higher probability of finishing.
A centralized migration needs two things: leverage over the codebase; and a method to "ratchet" incremental progress. With this in mind, Zimmerman introduced Sorbet, a type checker for Ruby. He maintained:
To refactor a large, stubborn codebase, you need to have a point of leverage and to pick good ratchets.
According to a 2017 company survey, developers at Stripes were unhappy and logged many complaints. Building Sorbet was key to address all of these complaints and introduced points of leverage to address them.
Use of ratcheting is achieved with the # typed
comments located at the top of each file. It acts like a ratchet because it's easy to "go up a little." Valid use of this construct is:
# typed: false
is used for syntax and constants# typed: true
is used for inference in methods# typed: strict
is used for every method that needs a signature
As a result, there was an improvement in developer satisfaction because a significant amount of the large, stubborn codebase was refactored, having a point of leverage using Sorbet and selecting good ratchets.
Ritter discussed why modularity is important with an example of a simple logger application containing personally identifiable information. The solution, however, can create tangled code despite writing well-intentioned code.
Ritter discussed two points of leverage: Packaging, which is inherent in Sorbet, but isn't enough to address modularity: and Layering, the "essential principle is that any element of a layer depends only on other elements in the same layer or on elements of the layer 'beneath' it. Communication upward must pass through some indirect mechanism" as defined by Eric Evans.
Ritter provided many code examples of layering, using ratchets that Zimmerman had earlier described, and the attributes of what makes a good ratchet.
How can this all fall apart? Walter J. Savitch, relaying a quote overheard at a computer science conference, stated:
In theory, there is no difference between theory and practice. In practice, there is.
Tools aren't always perfect at first, said Ritter, and recommended to not rush the launch of a project. One team can refactor a larger, stubborn codebase.
Efficient Incremental Processing with Netflix Maestro and Apache Iceberg was presented by Jun He, staff software engineer at Netflix. Jun kicked off his presentation with common problems at Netflix: data accuracy; data freshness; and cost efficiency.
Incremental Processing is an approach to process data in batch, but only on new or changed data. The intent is to capture incremental data changes and track their states regardless of how a change is processed.
Jun introduced Apache Iceberg, a high performance format for huge analytic tables. Key concepts include: catalog, table, snapshot, data file, partition, and Netflix Maestro, a horizontally scalable workflow orchestrator.
Incremental Processing Support (IPS) provides a clean and easy-to-adopt solution for the efficient incremental processing that addresses the aforementioned three common problems.
Incremental Change Capture defines how to efficiently capture the data change using Apache Iceberg metadata related to new and updated files.
Key takeaways on rethinking batch the Extract, Transform, Load (ETL) pipeline included:
- Efficient Capturing: the Apache Iceberg metadata enables incremental processing without accessing the data
- Simplified Adoption: decoupling change capture reduces complexity
- Improved Experience: clean interfaces improve usability
- New Patterns: IPS patterns apply to many scenarios
Jun concluded his presentation by describing planned, future improvements to the platform.
A Framework for Building Micro Metrics for LLM System Evaluation was presented by Denys Linkov, head of machine learning at Voiceflow and LinkedIn learning instructor. Linkov kicked off his presentation by asking the question: "So you're thinking about changing a system prompt?" Building LLM platforms is challenging as the process may be moving along nicely until there are unexpected consequences.
Linkov introduced his five lessons and described each one in detail:
- The flaws of one metric (single metrics are flawed)
- Models as systems (recommended)
- Build metrics that alert of user issues (recommended)
- Focus on business metrics (recommended)
- Crawl, walk, run (don't overcomplicate things)
What makes a good LLM response can be a philosophical question. LLMs generate plausible responses, but folks don't often agree on what is good.
Conclusion
QCon San Francisco, a five-day event, consisting of three days of presentations and two days of workshops, is organized by C4Media, a software media company focused on unbiased content and information in the enterprise development community and creators of InfoQ and QCon. For details on some of the conference tracks, check out these Software Architecture and Artificial Intelligence and Machine Learning news items.