InfoQ Homepage Presentations Efficient DevSecOps Workflows with a Little Help from AI

Efficient DevSecOps Workflows with a Little Help from AI

View Presentation

Speed:

Download

48:43

Summary

Michael Friedrich tells a story about experienced pain points, wasted hours debugging and solving, and learning how a little help from AI makes DevSecOps workflows efficient again.

Bio

Michael Friedrich is a Senior Developer Advocate at GitLab, focussing on DevSecOps, AI, Observability. He loves to educate everyone and regularly speaks at events and meetups. Michael created o11y.love as an Observability learning platform, and shares technology trends and insights into day-2-ops, eBPF, OpenTelemetry and AI/MLOps in his opsindev.news newsletter.

About the conference

Software is changing the world. QCon London empowers software development by facilitating the spread of knowledge and innovation in the developer community. A practitioner-driven conference, QCon is designed for technical team leads, architects, engineering directors, and project managers who influence innovation in their teams.

Transcript

Friedrich: I want to dive into efficient DevSecOps workflows with a little help from AI, and share some of our learnings. I'm Michael. I'm with GitLab for 4 years now. It's not about me, it's about some learnings. I also want to highlight that this is not a product talk. This is learning together. If you're interested in what GitLab is doing with AI, here are some links.

DevSecOps (Cloud Native)

Let's start with DevSecOps, and how does it relate to cloud native. What are the things we want to consider? If you're looking at the DevSecOps lifecycle, or the journey, is like, where are you in there? Is it maybe at the deploy stage? Is it previously like automating tests, having a staging environment? Maybe even starting from scratch and planning things? Throughout the talk, I want to invite you to think about like, what is the most inefficient task you're currently doing? Is it like creating issues, writing code, testing, security scanning, deployments, troubleshooting, root cause analysis, whatnot?

The question I'm having is like, what if AI can help us get more efficient? There are certain things to keep in mind with that. It's like, we have different workflow which we want to solve. We'll dive into some examples in a bit. It's also important to establish guardrails for AI. You want to ensure that the data is secure, that nothing gets leaked from your environment. The other thing I would encourage you to do is measure the impact of AI. It doesn't make sense to have AI because everyone has it. Make a case for it and prove that it's valuable. We will also dive into that throughout the talk.

Workflows: Dev

Let's start with the workflows. I try to split it up into dev, ops, and sec. Development typically starts with planning, managing, code, testing, documentation, review, something like that. From the first beginning of infusing AI into our workflows, it can be, I want to start a new project. I have different build tools like CMake for C++, cargo for Rust, different other things. I might need CI/CD configuration. There are code errors, I have follow-up questions. The great thing is like maybe there's a way to use an AI based or AI powered chat prompt. Having a UI, being able to ask a question. The following screenshot is from the Anthropic Claude Workbench, where I'm using a system prompt saying, you're an expert in everything.

Specifically, you can help with programming language projects. Then asking a question, how to get started with a Golang project, CLI commands, CI/CD configuration for GitLab, specifically. Then adding some OpenTelemetry instrumentation, because specifically, the application should be optimized for that. The great thing is, it provided me with a lot of responses, I couldn't even fit in on the slide here. It's a great way to get started and not have like me, 10 years ago, 100 Google tabs opened and figuring things out, but just ensure that I'm efficiently starting. It's also helpful for team onboarding.

The other thing is like, starting an issue, what should we do? Maybe the proposal takes 30 minutes to write, maybe 2 hours. It can be a way to use AI to generate issue descriptions from a shorter description, to make it in a long, more impactful issue description in that sense. In this example, it's again using observability and OpenTelemetry, and specifically focusing on research, figuring out whether we should instrument the source code using the SDKs, or using some auto-instrumentation, which could be doable, for example, with eBPF. We don't know yet. We can generate the issue description with the help of AI.

The other thing where I found AI helpful is plans and discussions. Maybe there's an issue already. It has a long discussion. It has a long description, for example, for bringing CI/CD observability into GitLab. I wrote that and it's 10 pages long. It can be helpful to use AI to summarize that and focus on what is important and get up to speed quickly, rather than reading the entire issue for many hours. It needs the context and a JSON representation of all the comments if you want to automate that and use that with a specific chat prompt. It can be a helpful way to get started. I also did the same with using the Anthropic Claude 3 Workbench, pasted in the issue description, and it was able to summarize it. It can be a way to use it with ChatGPT, Anthropic, whatever LLM provider, or SaaS is out there.

The next thing when we're thinking of development, or dev workflows is like code creation. I want to highlight two workflows, or two thoughts around this. Maybe as a senior developer, learning the latest programming language features is time consuming. As an example, I learned C++ '03 and maybe '11 a little bit. I haven't learned '14, '17, 2023 yet. It might be a good way to actually get more modern code into my habits into generating code in that sense, and also avoiding to create boilerplate code all the time. As an example, writing a RegEx for parsing IPv6 is fun, but actually not. Maybe use AI to just generate that because it's boring to work on things that are not efficient.

On the other side, as a beginner, it might be hard to understand specific algorithms. You might be pressured to learn fast in a new environment, in a new team. It might lead to human errors and even burnout. It's a great way to infuse AI in that regard to get AI-assisted suggestions as you type, and maybe even like learning a new programming language. I used AI to learn Rust in the last year, which can also be a great way to test and understand how AI can be helpful. From my perspective with code, I want to separate it between code suggestions and code generation. Code suggestions provide specific suggestions when you type, as an autocompletion way, or even as a code block. Can be like continuing an algorithm. Can be adding missing dependencies because it knows about the context. In the screenshot on the right-hand side, you can see the Go example with importing fmt and strings, but can also be class methods.

I was writing a Java implementation for Spring Boot or testing something, and they provided me with the autocompleted way of converting the function to string. I had no idea, would have probably needed to Google that for a while, or using my browser search of my preference. This is a great way to use that, in that sense. How it works? I'm typing in the IDE. I might be having an IDE extension from whatever vendor. It detects the intent, takes the file context and sends it over to the LLM or to the AI provider. Then, receives the suggested text. It's a grayed out action or grayed out text.

I think it's similar in the various vendor IDEs. You can accept that, or you ignore it and continue writing your own code. The other way to improve efficiency is, you can generate code, which means you have the context for a specific larger code block generation, which can be like functions, classes, libraries, frameworks, new feature exploration. In the screenshot, I wanted to write a Kubernetes observability CLI, just a simple example, to show how the library inclusion works, implement it in Go, and print pods, services, and some other resources.

I was just writing the source code comments in a way of using some sort of prompt engineering, whereas the IDE recognized like, it parses the intent, generate code blocks, and then streams the response in multi-line code. Which is great because it worked, just needed a little bit of adoption and so on. Then I was able to print the pods and services on my Kubernetes cluster. There's a video link, which shows how it's doable in 20 minutes. It's a great way to get more efficient.

The thing is, what does the code actually do? I might be generating code. I might be joining a new team, maintaining new code. Maybe I need to fix a bug fast, even fixing a bug fast when there is no documentation, no comments, no unit tests, nothing. I can use AI to explain what the source code is doing. Code explanation is one thing to leverage AI in the DevSecOps lifecycle, and say, please summarize what's going on in that source code. On the right-hand side, it took the code, broke it down into the specific sections, and explains that. Since generative AI is not predictable, it might differ every time you try that.

The great way is, you can refine the prompt. You can say, either it's a /command, or it's like specific text instruction and say, please focus on the algorithm, or explain it with a focus on the algorithm. Also explain it for performance improvements, or explain why this specific function would throw a segfault, which I had a little fun with a while ago in C++. It can be a really helpful way to understand, to learn, to improve the source code, and also improve your knowledge on specific things, rather than, again, copy paste the error message in a search, figuring out how things work, but rather have it autogenerated.

It also works not just in the example, which I just showed from GitLab itself. Rather, you could use any LLM provider defining a system prompt, and then asking questions about the source code and have it summarize it. This is exactly one of the things I'm really using also in my personal workflows. The other thing I'm super keen on is, I'm not sure who loves writing tests here? I don't like it. I've been in development for nearly 20 years now. I think of AI as a way to generate tests, but not just specific boring tests, but also refining the prompt again, and say, please focus on extreme cases, or please do some regression testing.

We even tried to break the code with specific unit tests to ensure that there are no regressions. It again requires some way to interact with the LLM. Either copy the source code into that context, or some providers also have IDE extensions, where you select the source code, right click, and then say, please generate tests. You get it integrated, you might even get a merge request or pull request being generated from that. Which, again, can also do it in a different way, in a different provider, and ask to generate tests for the source code. The only problem is on the left-hand side, it's not super formatted. You get the idea that it's possible to also use Claude 3, for example, from Anthropic to generate tests.

After we have written tests, and everything is amazing. We also want to refactor code. Improving the code quality, maybe even having a lot of legacy code around. Maybe think of, you have Python code, you have Go code, you have different languages. Someone says, we should be migrating to modern cloud native technologies, we should be using Go, maybe use Rust, something like that. Again, you can say, please refactor that, but also refine the context or refine the prompt in that way of, say, I want to focus on refactoring long spaghetti code, into functions, and then again, generate tests for these functions.

Or refactor it into object oriented patterns, like I did a while ago with C++, which worked pretty amazing to being able to use inheritance and other things with database operations, for example, supporting MySQL, PostgreSQL, and so on. This is the Anthropic version. There's also a way to refactor, for example, Go code into different languages. What I tried in this example was refactor the Go code from the Kubernetes observability CLI into Rust. Because I like challenges, I also tried Python. The great thing about this specific workflow is there are libraries available for Kubernetes in these languages, so it made the abstraction more easy. It's not bullet proof. It's a nice way to consider like, I'm not bound to refactoring and staying in my domain. If I need to do migration or need to adopt new technologies, and I hear that Rust is everywhere, everyone wants to use that, let's use AI to refactor code, or even start a new project in that regard, which I think is also a great way to use AI in our workflows.

The other thing I want to highlight is when we have written code, there's a certain stage of reviewing the code, which can be creating a merge request, or a pull request. We want to trigger CI/CD. Everything kicks off, security scanning, and so on. Before this can actually happen, I need to summarize the merge request. When I think of myself 10 years ago, sitting in front of a merge request, summarizing it, what are the code changes? What is the impact? What needs to be tested, for example? I can use AI. In the screenshot, you see a GitLab merge request to generate a summary of the changes in the merge request, which is then helpful for those who need to review the changes.

The other thing I want to highlight here is that it's also possible to use, for example, contribution graphs to generate suggested reviewers. I get a list of potential users on my team to review that specific merge request. The merge request here in the screenshot is in the GitLab handbook. We have roughly 2000 team members. I have no idea whom to assign this merge request for review, but I could use the power of AI to get suggestions. Who contributed the most to this specific file or to this specific directory? What could be an efficient way to assign them for review? It makes me more efficient. I don't need to use a lot of time in my code reviews, or even like documentation reviews in that example.

Workflows: Ops

This was a lot about development and how to use AI to make it more efficient. Now comes my favorite part, which is ops, like thinking about root cause analysis, observability, error tracking, but also performance efficiencies, and then cost optimization. I want to start with CI/CD, and it's blocking the reviews, the pipeline is running. This is a modified version of XKCD 303. What can we do about it? What's actually going on? Maybe there's a CI/CD pipeline with job logs, and it's red, which means it's broken. I need to look at the screenshot or look at the logs and figure out how to fix that. Maybe there's a hint in the logs, but I need to be a Python developer in that example, to really spot what's broken.

As a developer, I don't like broken CI/CD pipelines. I just want to fix them fast, or maybe they can auto-fix themselves. One way can be to say, I'm copy pasting the job log using AI. You copy it in a prompt, which is refined in analyzing this. It can be a one-time generated analysis, but could also be a chat prompt, so like have a conversation to follow up on specific things, ask about future optimizations, for example, efficient Docker containers, and specific other things. The problem is CI/CD job logs usually have some sensitive data in there, like when passwords are not being masked or credentials, orders like [inaudible 00:18:38].

We need to filter that in a specific way. Then get the answer from that. There is an example for the prompt on the slides. Like, you as an AI assistant explain the root cause of a CI/CD job failure, explain it in a way that any software engineer, and this is important, can understand to fix it immediately. Include an example of how the job might be fixed, and here are the job logs. After that, AI knows after the job logs marker, it's usable actually, copy paste the job logs, which is a great way to fix pipelines fast. I think I'm using that feature on a daily basis because it makes me more efficient.

Another point of thinking of deployment, and this goes into the cloud native direction, is, think about Kubernetes deployments which are failing or something is broken. You see, kubectl get pods, CrashLoopBackOff or something, image doesn't work, whatever is broken, and I found k8sgpt, which is also a CNCF sandbox project, which uses different LLM providers to analyze the Kubernetes deployment and then provide suggestions from an SRE perspective, or from an efficiency perspective. It works with different LLMs.

I was able to make it work with a local LLM provider, which is called Ollama, so you can run it on your MacBook. I pulled the Mistral LLM, which also works great because this was a Mixtral, which requires 48 gigabytes of RAM, which isn't really possible on my MacBook. I made it work to configure it with an OpenAI compatible interface. The screenshot says OpenAI, but the demo actually in that GitLab repository is using Ollama itself. The other example, or the other thought I had about observability, summarizing logs. It's amazing that we have elastic and different log ingestion pipelines. We also need to understand it. If there was an incident going on and you need to fix things fast, maybe even yesterday, because the customers are calling, it can be helpful to use AI to summarize and get a better idea of what is actually the root cause.

I've also found Honeycomb building AI into their product. They wrote a great blog post about how to build it into product, use query assistants for complex observability queries, and so on. This can also be a way of thinking of how AI can be helpful. Another project in the observability and the ops space I want to highlight is for sustainability monitoring, like power consumption forecasts. There's Kepler, which is an abbreviation of Kubernetes efficiency power level exporter. The way it does it, it uses eBPF to collect low-level metrics from the Linux kernel, with specific power consumption things in mind, and then uses machine learning to forecast what could be the power consumption, which can be a way to get more efficient in a sense of cost, but also sustainability, reducing CO2 emissions, and so on.

Workflows: Sec

Moving our heads to security workflows, what could be helpful with AI. Understanding and mitigating vulnerabilities, security scanning, dependencies, supply chain, a lot of things. What I've found useful is, and we need to turn back time probably 8 years, 10 years, we had a security incident, a CVE had been created. I was maintaining an open source monitoring tool with distributed environments, had custom messages. Someone said, in these custom messages, we need to fix something, make it more secure. I'm like, ok, let's fix the bug. I have no idea what exactly it does but the fix looks ok. We fix it, release it. Customers are installing the fix, everything is broken, or it's slow. We then debugged after figuring out, the fix for the security vulnerabilities should have looked different to what we actually implemented.

We made it slower, and broke the entire cluster communication at some point. I was thinking of maybe there's a way like nowadays, analyzing a security vulnerability to understand what's going on, and also how to fix it in a long-term way. Also, especially because certain security vulnerabilities have descriptions. I'm like, I have no idea what the format string vulnerability is, or what is command injection, maybe even a timing attack, or the typical problem of a buffer overflow. I really want to fix them in a way that I don't introduce a new regression or making the code less great. Vulnerability explanation is one of the ways to leverage that, so I can understand how a malicious attacker actually exploits the vulnerabilities. Also, understand the why and how to fix, especially when it's urgent.

One example prompt could be like saying, you as a software vulnerability developer or software security engineer, explain the vulnerability using this code snippet. Then provide a code example how an attacker can take advantage and how to fix it. In this example, it explains that. It's a screenshot from GitLab. We can also do it in a different prompt. The great thing is, AI can even help us fix that in a way of proposing a source code change in a merge request or in a pull request, because an explanation still requires work. I don't know exactly how to fix that.

Maybe there's a way to automate that in the sense of analyzing the source code, but then instructing AI to fix that, and automatically create a merge request, whether it's in GitLab or any other vendor. You can use that as a way to make it more efficient, because the merge request then triggers off security scanning again, CI/CD ensures that everything works. It's helpful for us because it's just like one button click or one interaction with AI.

AI Guardrails - Privacy, Data and Security Performance, Validation

This has been a lot about workflows and how to get more efficient. We also need to consider AI guardrails. Thinking about privacy, data, and security, but also performance and validation of whether AI makes sense in our workflows. The first thing I want to highlight, I want to emphasize on is data usage. Is my data or my source code being used to train the models again, because this could be leaked, somehow. This should be a no. Is my proprietary data sent to a SaaS provider for evaluating vulnerabilities, for example? If I cannot do that, if I'm in a network environment like a bank, or government, or something else, a regulated environment, this would also be a no.

The other thought is like, if AI features have a history chat prompt, does it memorize my data? Is there a retention policy? When is the data being deleted that I'm sending in? I think the most important part is to have a public statement on data usage and privacy. I would recommend to ask your DevOps or AI provider in that regard, answer these specific questions before you actually pull in the tools or the functionalities in your environment. The other thought is security. You probably want to control access to the AI features and models. Some governance control, in the sense of this environment or this development team needs access to the AI features, but others do not. You also want to block content being sent to prompt.

This is shown in the screenshot on the right-hand side. Maybe there's a way of using security explanation, or vulnerability explanation, but you're not allowed to send the actual source code. You want to take that off, and ensure that you still get help from AI, but you cannot share the source code because it's so proprietary that that is not allowed being shared. Last but not least, consider validating the prompt responses to avoid user exploitation. I know we all love to play with chat prompts. I recently asked our AI implementation in GitLab, behave like Clippy and do something, or ask me how you can help me today. The AI responded something like, you were waiting for a playful answer, but sorry, I'm just GitLab Duo prompt. Something like that. There are requirements required, guidelines for team members, and so on.

Transparency is something I'm also super keen on, or what I recommend in that regard, have some documentation for AI usage, development availability. How AI is being infused into product development. Have a plan for adding or updating AI models when you're using different things, when an incident is going on. When AI doesn't work, will it stop productivity in your developer's workflows? Consider having something like an AI Transparency Center, which is something we started at GitLab. I think everyone should have that. From a performance perspective, consider adding observability, whether you're using a SaaS API or self-managed API, or even local LLMs, there are different ways to ensure that the users get the responses from AI. Like observability for the infrastructure.

If you're using specific LLMs in your implementation or in your workflows, there's a tool called OpenLLMetry, it's a play thing around OpenTelemetry and LLMs. I've also found tracing with LangSmith super interesting to get an idea whether AI is behaving in the way that teams can use it, or even your customers, if you're building products with AI. A lot of information, a lot of thoughts around also how to validate LLMs, because I've seen hallucination in a sense of, I have provided the snippet above asking a simple question to generate some Kubernetes YAML configuration. It said, you can do it like this. I have provided the snippets above, and they run those snippets. I ask again, and it still thinks it provided the snippets. How to validate that quality of service and ensure that LLMs provide helpful answers. There are certain ways to define metrics, different frameworks for testing. I've also linked some examples for later to read on how GitLab is implementing that.

AI Impact

From guardrails to impact, how do you measure that? I put this upfront, it's hard. It's a new challenging area. We will need to think about going beyond developer productivity metrics. It could be a way to use the DORA metrics, but also considering team feedback, team satisfaction. What about code quality, test coverage, less failed CI/CD pipelines? The time to release decreases or stays in the same way. I think there is work to do. At GitLab, we are building these dashboards for our products, but I also think that everyone should have them in a certain way, maybe as a Grafana dashboard, maybe as something else to really figure out what is the impact of AI into workflows.

AI Adoption

We have, for AI adoption, a lot of things to consider. I applaud the CNCF landscape here. We want to integrate AI into our workflows. We need to consider guardrails for security, privacy, and so on. We also need to validate that, and validate the AI impact. Maybe there were even more advanced topics we want to still have in mind, because AI is moving fast and adoption as well. I thought of like, what else can I tell you about? What could be helpful for your workflows if you want to use more AI? Retrieval Augmented Generation, RAG, is a way to help LLMs when they cannot answer a specific question, because the model was trained with data from 2021. It says, I cannot really help you with that Rust work for now, or I cannot answer how the weather in London is currently because I don't have access to that specific data.

There's a way to use RAG in a sense of loading specific documents in a vector store and adding this into the large language model, so then, can we find the prompt? I found promptingguide.ai, where I also took the picture from, which explains that in a very nice way. I thought, now I've tried to understand it. Maybe there's a use case. I was talking to Eugene, and we thought about like, could we feed a Discord channel or a Slack channel or a knowledge base wiki, something into the LLM, and then provide a helpful knowledge base chat, which could be a way of using that. I've then learned, at GitLab we are using that in a way to load the GitLab Docs into our Duo Chat offering. Maybe I also want to build it myself. I was reading a lot and created a demo. I've also linked the repository here so you can reproduce what I've been learning.

The idea here was to load a Markdown file, split it by the headings, and then ask for a summary. The framework was, only local is allowed, so using LangChain, which is the development framework for AI and LLMs. Using Ollama with a local Mistral LLM again. You could also use Anthropic Claude, if you want to. For the vector DB, I was using Qdrant, which is a NoSQL database written in Rust, and super-fast. The idea was really here to summarize the entry page for my personal newsletter, which is opsindev.news, when writing about cloud native observability and a lot of things.

Then I was like, but this is just one page, maybe I can load a lot of pages. I have a knowledge base for observability at o11y.love, which has, I think, 20 different entries for topics. I wanted to ask, how is observability, what is eBPF, and so on? I even tried to load the GitLab handbook, which has 2000 pages. I think it takes 5 minutes to load into the vector DB. It worked. The great thing is, it's also able to answer questions. I wanted to provide the full context of being able to query AI, or the LLM and use these embeddings, or the Retrieval Augmented Generation, in that sense of answering a specific question about observability and eBPF, on the left-hand side.

Also being able to ask about remote work and low context communication, which we do at GitLab, as a specific example. The source code is available: it's open source, it's free. If I can do it, you can also do it. You just need to learn a few things, put things together. The great thing is, if you use Mac or Linux, you can run Ollama locally, pull the language models. You don't need to use any SaaS provider, and things like that, which personally blocked me in 2023 with learning. Since I found Ollama, I'm super productive with trying these things out. Maybe you can build something for your own production environment, building a knowledge base, a chat prompt, using your proprietary data in your environment, and you never leave the environment. You don't need to use any SaaS providers for that.

AI/LLM Agents

The other thing I want to put on your radar, it's maybe, think about AI or LLM agents. The current discussion is moving fast. The idea here is also, when the LLM considers it needs more data to answer the current question, AI agents can plan the execution to get that data. It's using tool function calls, memory and planning, and in the background it executes that. Then, it's able to provide a summary of the compilation of this Rust program would result into this error. It's actually super-fast and provides you an answer, which is more refined in the prompt. This area is moving so fast.

I think everyone is building it right now. It's super hard to also create a demo for it. The other consideration is custom prompts and models. Like I said, some requirements for local LLMs. Train it with source code, issues, docs in your environment, but it never leaves your network. One thing is the suggested reviews, which were built at GitLab. There are more custom models. I think every vendor is working on that, because customers really need that. Also, considerations like proxy tuning, so you don't need to retrain the model, which is super expensive, using GPU and a lot of hardware. What are these considerations? There's a lot of things to learn. Keep that in mind when it comes to DevSecOps workflows, as well.

Recap

To recap, from an efficiency perspective with DevSecOps, as a call to action. If you want to start now with implementing it, think about the workflows, guardrails, and impact. From a workflow's perspective, if you have repetitive tasks, low test coverage, bugs, use code suggestions, generate tests, use a chat prompt. If you move into the security area, and security regressions delay releases every month, vulnerability explanation, resolution, team know-how building. From an ops perspective, if developers are spending most of the time staring at failing deployments, root cause analysis, k8sgpt, a lot of more things to consider.

Questions and Answers

Ibryam: What do you think might be next in this space, especially in DevSecOps and AI? What are some of the cutting-edge pulling you see or work done?

Friedrich: From a developer's perspective, we create merge requests, and we have some reviews, maybe there's a way to use AI to do automated reviews. Because when you have many code changes, it might take a while to dive into it, understand the architecture, understand the full context of the changes. If we can use AI more in that regard, I think we get more efficient. From an ops perspective, I would love to have root cause analysis, but in the sense of anomaly detection. There's a term called AIOps. I have a lot of observability data, but can AI help me understand the problem and also provide a fix for me?

Even not provide the fix in a text form, but generate a merge request for my GitOps configuration in GitLab, using Flux Kubernetes, whatever, I don't need to really understand it. I need to understand it at the end. The entire generation of that thing happens automatically. I could even envision that it deploys it in production, then uses the observability data to detect a memory regression, triggers a rollback, creates an issue for the developer. On Monday, because the problem happened on Saturday evening, and everyone is not here, we will deal with the problem and do a proper fix.

Participant 1: I'm a keen practitioner of test-driven development, is it possible to generate the test before the code is written?

Friedrich: I think when you're working locally, and you have it in your IDE, it's possible if you do not have it committed yet. You select it in the IDE, and then say, generate test, and then maybe use a conversation in a chat prompt and say, where should I be pulling this test, if I'm not aware? You can actually put them into the source code before you created a commit. If you already committed it, and pushed it up, that's a different workflow. I think generally, I would say, yes. It really depends on how it's integrated in your IDE, whether it's Visual Studio code, JetBrains, Vim, or anything which works on the CLI. I think the most challenging part is how you create a great developer experience, or UX to integrate AI for test generation, for explanation, and so on.

Participant 2: With DevSecOps, obviously, you're trying to move security closer to the code. Do you think there's a use case for the AI or the LLM engine to analyze the code while it's being written, for security threats?

Friedrich: Yes, definitely. Either you use security explanation or mitigation or resolution in that sense. You can also just ask, or you select the source code portion and say, please explain the source code, but focus on specific vulnerabilities. I was writing some C code lately. I wasn't really sure if it's sane to ship it. Before even triggering security scanning, at some point with CI/CD, you're pushing it to the remote, I was able to understand, this is a race condition, this might leak something. I think there's definitely a lot of potential there to really understand why my code could create a vulnerability, even so I have no idea.

For me, as a developer, it's much more efficient to have that knowledge beforehand and see that as a helpful way, and not just, you need to fix security because everything shifts left and you're all alone there. Rather, see it as an opportunity to grow and fix things faster, or even before they actually happen. Because next time I remember if use sprintf in a specific way or a string copy, I shouldn't be using that because there's a more safe implementation already.

Participant 3: I was wondering around understanding the benefits from AI. How do you distinguish the benefits that you get from AI from other things that are changing in the organization, like process improvements, additional automation. Because the metrics like DORA or a number of PRs can be affected by many factors.

Friedrich: I think you need to develop or figure out new metrics, or maybe even correlating the feature adoption. I'm using an example with GitLab Duo, which is GitLab's AI implementation. How many users have code suggestions enabled, and the productivity goes up? Or maybe we can even focus on, did they accept a specific suggestion for that specific language, so we even have more filtering in that. Or, would they deny it all the time, and then figuring out why they denied accepting the code suggestion. You can also think of, you create a merge request with a vulnerability fix, is that being merged in production, or were there more changes needed because something was not really right in there?

What could be the metric of understanding how many AI features helped improve the source code or helped with development? Then, I don't know if we create new DORA metrics or what the format could be in that regard. Really being able to provide the data and continuously improve on that, because AI is moving so fast. The examples I brought, everyone is implementing that. There are more things to consider, like, you're in a test-driven environment, you're a product owner, you want to validate requirements list with the entire source code. There are many more things to consider.

Having that in mind that we need to measure the impact of AI in every feature we implement and add to the workflows, I think will create more efficiency metrics. We are at the start of that now. We're measuring the impact beyond code suggestions or beyond code creation. I'm looking forward to maybe talk about it next year, what we have built or what are the experiences, but for now, keep that in mind. We're not there yet.

See more presentations with transcripts

Recorded at:

Feb 11, 2025

Michael Friedrich

InfoQ Software Architects' Newsletter