In this episode, Thomas Betts and Adi Polak talk about the need for context engineering when interacting with LLMs and designing agentic systems. Prompt engineering techniques work with a stateless approach, while context engineering allows AI systems to be stateful.
Key Takeaways
- Prompt engineering is evolving quickly, and many once‑standard techniques like role assignment are becoming less effective as models and tooling mature.
- Effective prompting increasingly depends on strong domain knowledge so engineers can specify the right steps, constraints, and desired outcomes.
- Saving successful workflows as reusable skills helps teams scale their use of AI and avoid re‑deriving processes in every new session.
- Managing context carefully—loading only what’s needed and keeping long‑term knowledge separate from short‑term session memory—improves accuracy and cost.
- Agentic, stateful workflows built on event‑driven patterns are becoming essential for automating engineering tasks, enriching data, and coordinating multi‑step processes.
Subscribe on:
Transcript
Introduction [00:19]
Thomas Betts: Hello, and welcome to the InfoQ Podcast. I'm Thomas Betts. Today I'm talking with Adi Polak. She's a director at Confluent and the author of several books, including Scaling Machine Learning with Spark and High Performance Spark Second Edition. She recently spoke at QConAI about context engineering and what it takes to move beyond prompt engineering when building AI systems of scale. So Adi, welcome to the InfoQ Podcast.
Adi Polak: Thank you so much for having me, Thomas.
Defining Prompt Engineering vs. Context Engineering [00:43]
Thomas Betts: So to start the conversation, let's get a baseline. What's your definition of what prompt engineering is and how does that differ from context engineering?
Adi Polak: That's a really great question. It's something that a lot of people are asking themselves. It's essentially, how do we instruct an existing model or how do we instruct multiple models sometimes? So it will give us what we want at the end. How do we translate? What is the language that we use? Do we give it code? Do we give it pure English language that we work with?
And then it of course dives into best practices inside the world of prompt engineering. But this is the high level. Some of the practices becomes outdated really fast. So this is another thing to watch as we learn and change the workflows of how we're working with models. For example, role assignment for a very, very long time, role assignment was one of the key pattern of how to work with the models. You're an experienced backend software engineer specialized in Apache Spark, for example.
And now the model assumes you need to focus more on these specific technologies, and the model is now acting as a software engineer. Now that role assignment is slightly going away, and now we have more environment that is specialized for that particular thing. Another pattern is a few shot examples. That's kind of like how we think as humans. So instead of telling the model what to do, we give it examples that we know are considered good examples, maybe bad examples. We classify them and we tell it, "Hey, this is a good example. This is an example, not good example". And the model learns the patterns.
At the end of the day, behind the scenes, it's machine learning, statistics, it learns the patterns and it understands what we want. Kind of like how as humans, we learn from patterns that sometimes are not 100% well-defined. So this is another great pattern that exists and chain of thoughts, the model kind of prompt itself and give itself a feedback and think throughout the process that it goes through.
And that behind the scenes creates multiple calls to the model. And today the models know how to do it on their own, which is kind of fascinating by itself. You can imagine a ML AI pipeline and a loop with the model where it takes feedback for itself and kind of awkward, and that is a chain of thoughts. And another pattern is constrained settings. We give it specifications. We tell the model, "Here's my spec. Here's how I want my software to be built. Here are the languages". Take it, digest it, and this will be part of the prompt that we work with. Of course, take it with a grain of salt. These are things that are always evolving.
Prompt Engineering as a Moving Target [03:36]
Thomas Betts: Yes. Everything you mentioned, as we've learned the idea of prompt engineering, it started as one of those, almost a buzzword, like that's not really a thing. And you realize, no, we actually have to learn how to use this tool because LLMs, they're just a product. They're just a tool that we have. And how to use it properly and improperly, we kind of figured that out as we were going. And because those LLMs keep evolving, how we interact with them keeps evolving. So you mentioned, instead of just saying you're a role, now you're doing the few shot and saying, these are good and bad examples. And you said that gives it patterns to follow. We're still talking about not changing the LLM at all. We're not reprogramming the model and giving it new training data. We're just saying in the context of this session, behave this way.
Now they have larger sessions, so those contexts can grow, but you're still saying, just follow this example until I start a new thing and tell you to behave something differently, right?
Adi Polak: Right. And I want to add some interesting edge cases. We know models are really good at spitting up and giving us the next word or line of code and so on. I think in the past, maybe a year ago, there was a rush to create really good mathematical models as well. And the idea was that a model could potentially be able to come up with the right equation and the right way of taking a spreadsheet and doing all the statistics and calculation behind the scenes. And what we've learning now that the more general models, some of them should have been good at these things, this is not the case.
So as software engineers, we're going back to prompting the model of the things that we need to build. So instead of giving it a spreadsheet and hoping for the best that it will do what it needs to do, we're taking it back to, "Hey, write me this code that does these things to that spreadsheet or database or anything that we need to operate on and here is the math equation or here's what I want you to implement like sentiment analysis or take this state-of-the-art algorithm and implement it in here". So it becomes more specific.
And so prompt engineering really shines when we have the domain expertise. So this is an important bit that we're learning now that we need to know what we want to achieve and we need to have a rough idea of what are the steps to get there.
From Prompts to Reusable Tools & Skills [06:05]
Thomas Betts: Yes. And I think I've seen a lot of my coworkers at least have started turning to, I'm going to do this one thing and then I'm going to create a tool or a skill. I'm going to save a file that says, here's how to do that. It started with just our instructions files, but now it's turned into a lot of different things. Like you said, create a Python script or a PowerShell script or Bash script that does the thing I needed to do because if I go back to, I'm going to ask it the question, it might every time have to go back and say, "Well, maybe today I'm going to solve that with a Python script". And then tomorrow you ask it the same question. It's like, "I'm just going to call an MCP server and hope for the best". Once you figure out what you're actually trying to do is going to do that thing that we need the domain knowledge to be able to write good prompts, we can then use that to figure out what are the best possible tools.
Save that so I can repeat that process again. And I don't need to use up my tokens coming up with a process every time.
Adi Polak: Right. And this is how we scale as an engineering team if you think about it.
Human-Centric vs. Agentic Workflows [07:01]
Thomas Betts: Yes, the scalability is definitely a factor. And I think that gets to one of the things we want to talk about was we have really an agent workflow versus a human-centric workflow that a lot of this prompt engineering started with ChatGPT came out, Claude, all the other tools. I would just type in a prompt and get what I want. And whether that's code or text or an image, I just ask for something and I get it. And now I've moved to those tools have either their own built-in agentic workflow like, "Please accomplish this task and it'll plan out and it'll be, let me come up with a plan, show you what the plan is going to be, check it off as I go through it". How is that evolving? Isn't that what you're kind of talking about is context engineering is one way that we're watching these things happen in real time in front of us?
Adi Polak: Yes, a little bit. It's kind of amazing. So just a couple of days ago, we contribute a lot to open source. And so we developed an internal system that separates. Every time we do a git push to an external repository, we have a dedicated system that makes sure there's no IP being exposed. This is something that you want in engineering, especially with engineering, it contributes all the time to both open source and also builds the platform. So we want to eliminate mistakes of IP being exposed. And so couple of days I've been coding a little bit, something small and some of my very old commits from six years ago, in GitHub, you have PR commit log that captures all the SHA and all the information there. Some very, very old commits got picked up because I used a library that was open source, but the way I imported that library was the same way we imported the same library for other things.
So it wasn't like a real IP, it's just importing a library and using it line. And our system picked it up. Now, what I would do as an engineer was like, "I know there's a PR commit log. I know how Git operates. I know I need to pick up the SHA, go do the whole surgical experiment around, go figure it out what the files were there. Do we still need them? Can I delete that? Can I rebuild rebase to that state?" So delete that line and now GitHub behaves as if that doesn't exist. I took it kind of like the old-fashioned way. I had my notes, "Here are all the steps that I need to do. " And it became messy really quick because you have a lot of files, you have a lot of changes. And I'm talking about commit from six years ago.
I don't really remember what happened back then, what exactly I did. I just knew I needed to delete that one line. It's kind of like I need all in the haystack. Four hours into that process, I completely messed it up. I had meetings and Slack and so on. So I lost train of thoughts and completely messed it up. So I was like, "Okay, revert. Let's go back". So I reverted and then I was like, "Hey, wait, I have Claude code. I know what needs to get done. Can I explain to Claude and give it the right context, right? Everything that I need to be there and the tool access and what is happening with Git, can you do that for me?" And so I did. And within five minutes, I was able to push that code to the open source repo because Claude went in, did the surgery, fixed it up for me, and deleted what needed to be deleted.
And that's a game changer, in my opinion, when you think about these things, because it was only one example, but this is where it would take me really long time just because I don't do it on a daily basis and I think no engineer does it on a daily basis.
Context Switching, Distraction, and Agent Assistance [10:38]
Thomas Betts: You mentioned that you got distracted, you were trying to do it, but then you had to go to meetings and you had to answer other Slack conversations. And so when you said it was four hours later, you weren't sitting there typing code for four hours. It was four human hours of your day and you were context switching. You didn't just get to think about that one problem. And I know there's a little bit of the yak shaving, if you will, of I went down here and then I had to do this other step and then I had to do another step before I could do that. And so you've had to remember, well, where was I in the stack of things I had to do before I could get to the one thing I actually wanted to accomplish? We sometimes struggle as people to remember all that stuff.
And sometimes I'll open up Notepad and I'll just write down notes. If here's what I'm doing, here's the stuff that I've done. It seems like we're seeing the tools. And this isn't the low level LLM, this is like you said, Claude Code. A product has been adding that capability in. And I think as we build more LLM products or more products that we're saying have AI in them and they're really LLMs under the covers, that's what we as engineers and architects need to think about is how do we manage that context within our application so that the user gets a positive experience? They say, "I want to accomplish this task. Our software figures out, here's how I can accomplish this task, but I'm going to write down what I'm going to do". I might ask the human, "Does this sound like a good step or should I just go off and do it?"
So how do we accomplish that? Is it really just like, let me write a markdown file and keep referencing it and updating as I go through? What are some good techniques for keeping that context managed?
Capturing Workflows as Skills [12:13]
Adi Polak: Yes. So one of my approach that I really like is actually going through the process and then asking Claude to save it as a skill. This way, I don't need to start from a blank sheet of a skill MD. This way, I can make sure it's actually how I want things to happen so we're having a conversation. Sometimes Claude will give me options. It's like, "Did you mean this or did you mean that? Could you clarify?" And similar to speaking with the very thoughtful human, if you think about it for a second, it's really think about edge cases and so on.
Not everything, not all the time that makes mistakes and sometimes like, no, we lost track of where we were and what I really wanted to happen. But after we do that, we have a session and then I can say, "Hey, save everything you just learned into a skill and let's added that skill to make sure it actually captures what expected". And this has proven itself because now we're able to stock skills and create a repository of skills across the team and across the company where people can take advantage of wanting each other knowledge essentially and it helps us create better software because then we can use multiple skills.
Context Switching, Distraction, and Agent Assistance [13:29]
Adi Polak: So when I'm entering a session, a coding session, for example, I can say, "Here's my skills repository, but I'm not loading everything to my context yet". And this is a very important point here. I don't want to load everything to my context, but I do want to have the knowledge of what exists there. So it should be searchable. I want to be able to track it. I want to know what's the level of quality that I have for that skill as well. So if we can maintain that, that's really good. And then I can decide for a specific session, for a specific task that we want to operate on is what's the right context to bring? Because LLM, if we are overwhelming it with too large of a context, it's going to make more mistakes and it's going to cost more because at the end of the day, what happens, the mechanics of things, it's just concatenating everything for every command, right?
So it's a string concatenation behind the scenes for the most parts, right? Maybe there are more sophisticated ways to go about that the big companies are implementing, but at the end of the day, that's it. And so we want to be smart about what the context that we're bringing. We want to be smart about how we're managing that and not overwhelm the system if we can.
Developer Experience → Agent Experience [14:48]
Thomas Betts: Yes. I like the idea of having the skills library. It's not, I have to load up all of these things I know how to do, but I know I can go and look at this list that says, "Here are the things that I know how to do and look through those skills and realize, okay, I need that one. Now go look it up". I'm old enough that I used to have books on a shelf that had very useful reference before you could just find everything on Stack Overflow or ask whatever your LLM tool is and you turn to the index and say, "I'm looking for this. " And you'd find some sample bit of code and you're like, "Okay, how do I take this generic example and use it in my specific case? I can apply the idea that they're showing in this book or I find an article online and I apply it".
Now we've got software that does that. We almost need to start thinking of those same ways. How would I as a human store this reference material that I know how to do? Where would I go look it up? It gets back to the idea that when you use these as just a one-shot prompt, I can ask it anything and it's going to be something, the quality is not going to be that good. If you start using the product better, using the tools better and using those techniques, no one was expected to know everything that's on the internet. I didn't know everything was in those books, I would go look them up. So set up the tools to be successful by saying, "Here, go look these things up and building that into our systems". That just seems like a different way of thinking about software than we've really had to address before.
Adi Polak: Yes. If you think about it for a second, we're moving from a world of the developer experience, the DevEx world into an agent experience, like, how do I expose... Me as a developer, I love books, I wrote books, I want to go there and read and learn and use it as reference. But I know this is today with the tools that I have, there are better options for me out there, but my agents now need access to it. So now I wanted to know about this great tool that just came out or I wanted to know about utility tool that we developed in house that I want to use.
We still care a lot about developer experience and flow state and giving people the right tools to be successful and productive. And we're also adding the agentic experience. How do we build the workflows and the systems and how do we build those sometimes multi-agent systems that goes across our development life cycle, the SDLC and CI/CD and what's happening in production and so on to really be successful with that.
And those tools needs context and they're often stateful. So there's a lot of things that goes into it, but the first step is we need context and we need to move from a stateless kind of like the chatbot era to a stateful and agentic approach.
Short‑Term vs. Long‑Term Memory in Agent Systems [17:44]
Thomas Betts: And that's talking about sort of long-term memory. I'm going to start this process and it's not, I'm going to finish this immediately and I'm done and I die and I start up a new process and I run it and it's done. That's the stateless thing you're talking about. You're talking about stateful. I might have an agent that's doing a business process over days and it's going to pick up different pieces of that. Or going to your multi-agent scenario, I like the idea of the orchestrator agent having all these different subagents. How do you manage the memory between all of those things? Do they all need to know everything that's going on or is there some techniques you know to break up the context and say, solve this problem with what you need to know and here's the bigger picture so you give a better answer?
Adi Polak: It very much depends on the workflow that I'm building. So we always think about short-term memory and long-term memory. These are always things that we want to know. It's like, what do I need to know that I need to pull in? It's like a Very high latency, I need it right now and I can forget it about it later. It's only for this specific task or only for this specific subtask even. So it's not even a full task. And then there are different techniques where we can maintain that context throughout the longer session like summarization and thinking about the hierarchy of that context.
I think sometimes the model will tell me, "Hey, I'm working with Python, but then I'll kick it off again". And now it's like, okay, now we're switching to MCP servers and maybe you're a JavaScript developer suddenly and you find yourself with different scripts from different programming language. So we don't want that and we want this context to be already with the session. Now, depending on what we're building, depending on what our needs are, it could also be that we need a long-term memory. So a long-term memory, it's more about overall what exists in my system, what do you call a durable knowledge of where we're operating in that space.
And this is where things like skills also becomes really, really important. And also maybe we want to start looking at things like RAG, kind of like bringing that information from databases or other data system that existed and retrieving that information and augmenting it where we need. Now this is more of a long-term context, long-term memory, it's also an important pillar. So I have my short what's exist in my session, and then I have my long-term, I need to always keep it up to date, have a different, maybe a data pipeline, maybe an ML pipeline. Names are a little bit changing these days, but I do know that I need that to exist.
RAG as One Tool Among Many fo Multi-Agent Workflows [20:30]
Thomas Betts: Well, and I'm glad you brought up RAG because a year ago that was the thing everyone was latching onto. You got to have RAG, you got to have RAG. It's the fastest, easiest way to just get a better result. And now RAG is just one more tool in the toolbox. But it seems like if you're working on the agentic workflow, now it's not like, oh, always go look up this one source that I have a RAG model for. It's when I need to get there, go and find the relevant answers, bring that into, again, this context of this task I'm trying to solve, we had that idea, but it was still, again, a year ago as the ancient past, that was the stateless model almost. It was like, go call RAG, come back, but now I want to have this more stateful solution where one step of the process went out and looked and found that information, brought that in, and then the results of that get passed off to another smaller agent that does something else with it, right?
Is that kind of how you're envisioning these things fitting together into a multi-agent model?
Adi Polak: Yes. And I think it very much depends on what's the workflows that we're building. If we're building workflows for engineering, sometimes that will look different because now we're looking at pure enablement that are based on skills and writing software and our knowledge and experiences and so on. And reaching the model with that thought in mind usually means that we have new skills of how we're going to do things. We have best practices, we have design patterns that we want the model to follow, we have a testing suite. I have my doctor agent that test things for me. So I wake up in the morning and I run it and I make sure some of the tickets and everything is operating as I expect it to operate. Now this is 100% productivity tool. Now there's a whole other world of BI, like the business intelligence part where we want to enrich existing data and we want to create better SQL that are based on new tables that we have in there.
Now, it could be that we work at a large company. Usually when we have large companies, we have silos of data in places and I want to be able to retrieve and I want to be able to search what happened there, even if I don't necessarily know the exact name of the tables.
So this is where things like RAG can help and reach my queries and reach my SQL and expand beyond what I know right now can really help because this is what it was built for. There is a good indexed database sitting somewhere where we can do that retrieval based on semantics that we have. Now the semantics is really important because the idea is we don't know exactly what we're searching and when we don't know exactly what we're searching, we want to be able to use whatever language we're using and the model will try to give us a couple of ideas of, "Oh, this table exists over there". I just pulled it out of my catalog. It's a new table. They just created that yesterday, but it's already populated with the latest data. Would you be interested in looking into it? Would that make sense for the business question that you're asking?
Now this is a whole different workflow than using AI for improving productivity as engineers.
Event‑Driven Architectures with Kafka & Flink [23:56]
Thomas Betts: Yes. That discoverability problem is always out there. If you don't go out there and look for new things, how do you know the new things are coming out? And so we can have different tools that are constantly watching for that and letting us know. I know that you, since you work at Confluent, you're familiar with Apache Kafka, Apache Flink. Where do those products fit into an Agentic architecture? I know a lot of people are out there using them and wondering what are the benefits that those provide if I'm building some of my workflows on top of those tools and what are some of the things that you're hoping that Teams will start adopting if they aren't familiar with?
Adi Polak: Yes. So there's one thing that companies are already adopting and they shared it in some of the largest stages and some of Kafka Summit and conferences that we have around the world around data streaming. So one of them, for example, is OpenAI. They have a very large Kafka cluster as well as Flink. And for them, everything that they do with the models that people interact with them in real time, they build it through event-driven architecture. So behind the scenes, you have Kafka that operates for those, transferring those events in real time, very, very low latency. And then they have Flink for enrichment, summarization, real-time analytics. On top of those, in order to improve and give back to the model more context, and I talked about before about the models today are already taking steps on their own. As we talked about prompt engineering, we touched on the fact that we have chain of thoughts, and that chain of thought is essentially you can think about as event-driven pattern.
One of those, probably we can define it as a new pattern, and it becomes the infrastructure for when we want to build these type of solutions. So even in house, when we're thinking about how we build workflows to support some of the engineering work, right? Maybe we don't need a huge Flink cluster, but it definitely helps to think event driven on these things like what triggers what, especially for processes and workflows that are related to maybe our ticketing system, maybe related to code quality that we have agents running over code and suggesting new tickets of how we can do better. Maybe agents that are picking up tickets and already giving us ideas for solutions with code and just waiting for our developer to actually approve those. And because we started to see more companies curious about that, we actually developed some what we call Flink Streaming Agents API and the open source data.
And for anyone who wants to use it, try it, contribute to it, it's available as part of the Flink repository.
Automating Backlogs and Engineering Workflows [26:42]
Thomas Betts: I like the idea of those event-driven workflows becoming more of the agentic processing as opposed to, I'm going to start a process, then these things happen, I can have stuff working in the background, and then those agents are responding to it. And I can see a lot of ... Again, it's a different paradigm you have to think about as opposed to do this work. It's like, when this happens, start that. You mentioned software to SDLC stuff, CICD pipelines. I check in a commit, kick off a pull request or run this process and check, did I have any IP going out? Or all those different things can happen, and having that loop into just the bigger context model. Is that what you were talking about Flink is for enriching the data? So Kafka is this event happened, then Flink comes in and says, "Because that happened and I also saw these other things happen, here's the bigger picture". Am I understanding that correctly or is there a better way to explain that?
Adi Polak: This is one example. The example I had in mind is more of we have an endless backlog of Jira tickets, or not Jira, whatever your favorite system for tickets. And this endless backlog needs to be triaged, sorted, organized, and enriched with more information from our internal architecture docs, prioritization that comes from product, from customers and so on. So these are different systems. So imagine that you have a daily routine that runs in agents that go there and summarize that for you, prioritize that for you. And for the simple things, even suggest coding solution, even creates a pull request and opens a pull request for a developer to take action on. This is where we have the multi-agent process kicking in and improving some of the engineering processes, especially giving many companies who have a huge backlog. We never get to the bottom of the features backlog.
And there's always small things that we need to update related to maintenance or related to migration especially that we know we need to do. No, it's not a priority because the customer didn't ask for them specifically, but maybe if an LLM could pick it up and suggest a solution, we can execute faster on reviewing rather than taking the tickets and doing it ourselves. So these are some of the things that we're building and helping engineers do more meaningful work as well.
Looking Ahead: Creativity as the Core Skill [29:13]
Thomas Betts: Well, I always like to wrap up our conversations with a look towards the future. And the thing is with AI, I feel like I can't ask, what do you see one to three years down the road? So maybe it's just six months from now or within the next year. How do you think people will be designing, architecting and building software systems differently, specifically around talking about context engineering?
Adi Polak: Yes. I don't know how things are going to change and what would be the best solutions for us. The only thing I can say is if people listening in the house have the opportunity to try new things, bring new ideas, use their creativity. We always say if you can dream it, you can do it, you can build it. I truly believe with the tools that we have today, if you can dream it, you can do it and you can execute it much faster than we could have had years back. I don't know how things will change. I don't know where we'll be, but if you're curious, if you want to stay up to date in this industry, if you want to continue building, your creativity is the most important skill right now, use it, do it, go for it.
Thomas Betts: Yes. Just keep trying it and don't think that, "Oh, I haven't used it in six months". It's probably changed quite a bit since the last time you tried it out. So try something again. You'll probably get a much different result. Well, I think that's about time. Adi Polak, thanks again for joining me today.
Adi Polak: Thomas, thank you so much for having me. It's always a pleasure catching up.
Thomas Betts: And listeners, we hope you'll join us again soon for another episode of the InfoQ Podcast.