InfoQ Homepage Presentations Applying AI to the SDLC: New Ideas and Gotchas! - Leveraging AI to Improve Software Engineering

AI, ML & Data Engineering

Applying AI to the SDLC: New Ideas and Gotchas! - Leveraging AI to Improve Software Engineering

View Presentation

Speed:

Download

50:02

Summary

Tracy Bannon discusses using Generative AI in software engineering with AI-assistance to meet the speed and quality of end-users demand.

Bio

Tracy Bannon is a Passionate Software Architect and Change Agent who writes, speaks, teaches, and practices her craft every day. As an accomplished software architect, engineer, and researcher, she has worked across commercial and government clients. She focuses on bringing techniques to modern software practices including applying AI/ML/Generative AI to the software development lifecycle

About the conference

Software is changing the world. QCon London empowers software development by facilitating the spread of knowledge and innovation in the developer community. A practitioner-driven conference, QCon is designed for technical team leads, architects, engineering directors, and project managers who influence innovation in their teams.

Transcript

Bannon: I've been navigating the city. It really got me thinking about something. It got me thinking about the fact that I could use my phone to get anywhere I needed to go. It got me to think about how ubiquitous it is that we can navigate easily anywhere we want to go. It's built into our cars. I rode bicycle, and I have a computer on my road bike. We always know where I am. You can buy a little chip now and you can sew it into the back of your children's sweatshirts, and things, and always know where they're at. It's really ubiquitous. It didn't start out that way.

When I learned to drive, I learned to drive with a map. As a matter of fact, I was graded on how well I could refold the map, obviously a skill that I haven't worried about since then. I was also driving during the digital transition when all of that amazing cartography information was digitized. Somebody realized, we can put a frontend on this, and we can ask people where they're starting, where they're going. Then we can give them step by step, a place to go. They still had to print it out. If you happened to be the first person who was in the passenger seat, you got to be the voice, "In 100 meters, take a left, the ramp onto the M4."

It wasn't long until we had special hardware. Now we had a Garmin, or we had a TomTom. It was mixing the cartography information. It was mixing the voice aspect, and was mixing that hardware together. It was fantastic. When my children started to drive, they started with a TomTom, but I made them learn to read a map, because if you can see what it says there, the signal was lost. Now, it's everywhere. It is ubiquitous for us. In 2008, the iPhone was released, the iPhone 3G, and it had that sensor in it. Now everywhere that we went, we have the ability to tell where we are. We can track our packages. We can track when the car is coming to pick us up. We can track all sorts of different things. We've just begun to expect that. What does that have to do with AI, with software engineering? That's because I believe that this is where we're at right now. I think we're at the digital transition when it comes specifically to generative AI and leveraging that to help us to build software.

My name is Tracy Bannon. I like word clouds. I am a software architect. I am a researcher now. That's been something newer in my career over the last couple of years. I work for a company called the MITRE Corporation. We're a federally funded research and development. The U.S. government realized that they needed help, they needed technologists that weren't trying to sell anything. I get paid to talk straight.

AI in Software Engineering

Let's go back in time everybody, 2023, where were you when you heard that 100 million people were using ChatGPT? I do remember that all of a sudden, my social feed, my emails, newsletters, everything said AI. Chronic FOMO. It's almost as though you expect to go walking down the aisle in the grocery and see AI sticker slapped on the milk and on the biscuits and on the cereal, because obviously it's everywhere, it's everything. Please, don't get swept up in the hype. I know here at QCon and with InfoQ, we prefer to talk about crossing the chasm. I'm going to use the Gartner Hype Cycle for a moment.

The words are beautiful. Are we at the technology trigger when it comes to AI in software engineering? Are we at the peak of inflated expectations, the trough of disillusionment? Have we started up the slope of enlightenment yet? Are we yet at the plateau of productivity? Where do you think we are? It's one of the few times that I agree with Gartner. We are at the peak of inflated expectations. Granted, Gartner is often late to the game. By the time they realize it, oftentimes I believe that we're further along the hype cycle. What's interesting here is, 2 to 5 years to the plateau of productivity.

How many people would agree with that? Based on what I'm seeing, based on my experience, based on research, I believe that's correct. What we do, as software architects, as software engineers is really complex. It's not a straight line in any decision that we're making. We use architectural tradeoff. I love the quote by Grady Booch. The entire history of software engineering is one of rising levels of abstraction. We've heard about that. We've heard about the discussions of needing to have orchestration platforms of many different layers, of many different libraries that are necessary to abstract and make AI, generative AI in specific, helpful.

Where Can AI Be Used with DevSecOps?

I have the luxury of working with about 200 of the leading data scientists and data engineers in the world. I sat down with a couple of them and said, "I'm going to QCon. This is the audience. How would you explain to me all of the different types of AI that exist, the ML universe beyond generative AI?" Did we draw frameworks? We had slide after slide. I came back too and said, let's take this instead like Legos and dump them on the table. What's important to take away from this slide, is that generative AI is simply one piece of a massive puzzle.

There are many different types of AI, many types of ML, many different types of algorithms that we can and should be using. Where do you think AI can be used within DevSecOps, within the software development lifecycle? The first time I published this was in October of last year, and there are at least a half a dozen additional areas that have been added to that during this time. What's important is that generative AI is only one piece of the puzzle here. We've been using AI, we've been using ML for years. How do we get after digital twins, if we're dealing with cyber-physical systems? We're not simply generating new scripts and new codes. We're leveraging deterministic algorithms for what we need to do. Remember that generative AI is non-deterministic. With it, though, it has groundbreaking potential, generative AI in specific, groundbreaking potential. It has limitations and it has challenges.

Treat generative AI like a young apprentice. I don't mean somebody who's coming out of college. I mean that 15-year-old, brings a lot of energy, and you're excited to have them there. Occasionally they do something right, and it really makes you happy. Most of the time, you're cocking your head to the side and saying, what were you thinking? We heard that with stories in the tracks especially around AI and ML. Pay very close attention.

I'm going to take you back for a moment, and just make sure that I say to you that this is not just my opinion. This is what the research is showing. There are service providers who have provided AI capabilities who are now making sure that they have all kinds of disclaimers, and they have all kinds of advice for you that they're providing guidance that says, make sure you have humans in the loop. Do you think that generative AI contradicts DevSecOps principles? It does. When I think about traceability, if it's being generated by a black box that I don't know, that's much more difficult.

How about auditability? That's part of DevSecOps. How am I going to be able to audit something that I don't understand where it came from, or the provenance for it? Reproducibility? Anybody ever hit the regenerate button? Does it come back with the same thing? Reproducibility. Explainability, do you understand what was just generated and handed to you? Whether it's a test, whether it's code, whether it's script, whether it's something else, do you understand? Then there's security. We're going to talk a lot about security.

There was a survey of over 500 developers, and of those 500 developers, 56% of them are leveraging AI. Of that 56%, all of them are finding security issues in the code completion or the code generation that they're running into. There's also this concept of reduced collaboration. Why? Why would there be reduced collaboration? If you're spending your time talking to your GAI (Generative AI) friend, and not talking to the person beside you, you're investing in that necessary prompting and chatting.

It has been shown so far, to reduce the collaboration. Where are people using it today for building software? We've spent a lot of time talking about how we can provide it as a capability to end users, but how are we using it to generate software, to build the capabilities we deliver into production? I don't ignore the industry or the commercial surveys, because if you're interviewing or serving hundreds of thousands of people, even tens of thousands of people, I'm not going to ignore that as a researcher. Yes, Stack Overflow friends.

Thirty-seven thousand developers answered the survey, and of that, 44% right now are attempting to use AI for their job. Twenty-five additional percent said they really want to. Perhaps that's FOMO, perhaps not. What are they using it for, of that 44% that are leveraging it? Let me read you some statistics. Eighty-two percent are attempting to generate some kind of code. That's a pretty high number. Forty-eight percent are debugging. Another 34%, documentation. This is my personal favorite, which is explaining the code base. Using it to look at language that already exists. Less than a quarter are using it for software testing.

AI-Assisted Requirements Analysis

This is a true story. This is my story from the January timeframe about how I was able to leverage with my team, AI, to assist us with requirements analysis. What we did was we met with our user base, and we got their permission. "I'm going to talk with you. I'm going to record it. We're going to take those transcriptions, are you ok if I leverage a GPT tool to help us analyze it?" The answer was yes. We also crowd sourced via survey. It was freeform, by and large.

Very little was rationalized using like, or anything along that line. When we fed all of that in through a series of very specific prompts, we were able to uncover some sentiments that were not really as overt as we had thought. There were other things that people were looking for in their requirements. When it comes to requirements analysis, I believe it is strong use of the tool, because you're feeding in your language and you're extracting from that. It's not generating on its own. Things to be concerned about. Make sure you put your prompt into your version control.

Don't just put the prompt into version control, but keep track of what model or what service that you're posting it against. Because as we've heard, as we know, those different prompts react differently with different models. Why would I talk about diverse datasets? The models themselves have been proven to have issues with bias. It's already a leading practice for you to make sure that you're talking to a diverse user group when you're identifying and pulling those requirements out. Now you have that added need that you have to make sure that you are balancing the potentiality that the model has a bias in it. Make sure that your datasets, make sure that the interviews, make sure the people you talk to represent a diverse set. Of course, rigorous testing, humans in the loop.

AI-Assisted Testing Use Cases, and Testing Considerations

I personally like it for test cases. There was some research that was published in the January timeframe that made me take pause. It said that only 47% of organizations have automated their testing. In some of the places where I work where there's cyber-physical systems, when I'm working with the military, I want it to be higher than that. That also means that 53% have manual testing going on. Let's realize and let's be ok with the fact that there's manual testing going on, and let's sit our QA professionals down in front of a chat engine.

Let's make sure that they have their functional requirements, they have their manual test cases, they have their scenarios. That they have their user stories. That they have journey maps. Let them sit down and let them go through Chain-of-Thought prompting, and allow the GPT to be their muse because you will be surprised how well it can really help. Back to Stack Overflow, 55% said that they were interested in somehow using generative AI specifically for testing, yet only 3% trust it. It could be because it is non-deterministic. I bring that up because you can use generative AI to help you with synthetic test data generation. It's not always going to give you anything that is as accurate as you would like. There are some got you's we'll come back to.

One of the got you's is privacy. If you're taking your data, elements of your data, aspects of your data and feeding it into anybody else's subscription model, if you are not self-hosting, and owning it yourself, you could have a data privacy concern. You could also have issues with the integrity of that data. You have to be highly in tune with what's happening with your information if you're sending it out to a subscription service. Also, beware, we've talked about hallucinations. It happens when you generate tests as well, you can have irrelevant tests. I've seen it. I've experienced it. It happens. Back to transparency and explainability. The tests that come forward, the code that comes forward, sometimes it's not as helpful as you'd like it to be.

AI-Assisted Coding

Let's talk about the elephant in the corner. No technical conference would be complete without talking about code generation. When it comes to coding, there's an interesting trend that's happening right now. Major providers are pulling back from calling it code generation to calling it code completion. That should resonate with us. That should point out to us that something is afoot. If they're pulling back from saying code generation to code completion, there's a reason for that. It is amazing when it comes to explaining your existing code base.

Now you have to be ok with exposing your existing code base to whatever that language model is, whether it's hosted or not. Generally, the code that you get out of this thing will be wonderfully structured. It will be well formatted, and occasionally it'll work. There's a study from Purdue University that has shown that when they prompt for software engineering questions, that about 52% of the time, the answers are wrong. That means we're getting inaccurate code generated. We have to be cognizant of it. Remember, this is groundbreaking potential. This is amazing stuff, limitations and challenges. Just go in with eyes wide open. These tools can help to generate code. What it can't do is it can't build software, not yet. Look at the blue arrow, that's what I want you to focus on. That's one of three choices for any one piece of code.

In this instance, I've seen it go as high as six, and you're simply asking for a module, a function, a small tidbit. The person that you see you there is suffering from what we call decision fatigue. Decision fatigue in the past has been studied with medical professionals, military, the judiciary, places where people have to make really important decisions constantly, they're under high pressure, and their ability to make those decisions deteriorates. In what world should we be studying decision fatigue in software engineering? We shouldn't be. In-IDE help can be fantastic when it comes to helping you with that blank page mentality that we get to. It can really help with that. I can tell you, day in and day out, it can cause some fatigue. Groundbreaking potential. Know the limitations, know the challenges.

AI-Assisted Coding Considerations

Some things to be concerned about, or at least to be aware of, considerations. You will see unequal productivity gains with the different individuals who are using it. Somebody new in career, new to the organization will have less individual productivity gains than somebody who's more senior who can look at the code and can understand there's a problem. I see it. I see the problem. Code churn, this is something that a company named GitClear has been studying on GitHub for years. From 2019 until 2023, the code churn value by industry was roughly the same.

What code churn is, is I take that code that I've written or I've helped writing, I check it in, I then check it out. I tinker with it: I check it in, I check it out. There's a problem with it: I check it in, I check it out. Code churn. In 2024, we are on pace to double code churn. Is it caused by generation? Is there correlation? I don't know. We are going to watch that, because that's an interesting number to see rising. The code is less secure. I know people don't want to believe that, it is. I'll tell you a personal story first. Second week of March, I sat through an entire afternoon workshop. I was using GitHub Copilot, good tool. It has some real value. We're using Java code base. I was able, even with what I thought was pretty articulate and elegant prompting, to have OWASP Top 10s right there.

I had my SQL injection right there in front of me, unless I very clearly articulated, don't do this, be aware. That means that the code is less secure, by nature. There was a Stanford study that came out, and that Stanford report clearly demonstrated, it's a security professional's worst nightmare. We tend to think that it's right. We tend to overlook it because it is well formatted. It's almost as though it has authenticity. It's speaking to us. It looks correct, so more issues are sneaking into the code. What's that mean? We need rigorous testing. We need humans in the loop. As a matter of fact, now, we actually need more humans, not fewer humans. Don't worry about losing your job. There's a lot for us to do.

GAI Can Be Unreliable

Generative AI can be unreliable. Pay very close attention. You'll notice that I'm emphasizing the person who has the oversight this time. There was a North Carolina State University study that came out that said that 58% of us when we are doing code reviews, are now doing what's called copping out, means that we only look at the diffs. Why does that matter? I was talking to a team member of mine, his name is Carlton. He's a technical lead, has a beautiful team. One of his Rockstar developers is named Stephen.

These are real people. I asked Carlton, how do you do code reviews for Stephen? He said, I pull it up. I've worked with Stephen for 5 years. I trust his capabilities. I know his competencies. I only look at the diffs. When you have someone new in your organization, new to your team, new to this domain, what do you do with their code changes? I open them up. I studied it. I make sure that they understand what they were doing. I back out into other pieces of the code. I really study it. If Stephen starts to use a code completion tool, or a code generation tool, and there's pressure on him to get something done quickly, do you trust him with the same amount of trust that you had before? Carlton's eyes got pretty big. I'm going to have to not cop out. If you're doing something like pair programming, where you are not necessarily doing the code reviews in the same way, you're going to run a rotate partners more quickly.

You may want to rotate in a domain expert at some point. Consider more frequent rotations. Also, think about bringing together individuals who can help you with more SAST, more static analysis with all of these. There was an announcement from GitLab, they've purchased a tool. They've purchased a corporation that provides SAST because they want to make sure that there's more SAST scanning going on in the DevOps pipeline, going on in our ability to turn out this code because we have to pay closer attention.

If you're generating code, don't generate the tests. If you're generating the tests, don't generate the code. You need to have that independent verification. This is just smart stuff. There can be bias and there can be blind spots. There can also be this really interesting condition that I learned about called overfitting. It's when a model is trained, and there's some noise in the training data, and it causes it to be hyper-focused in one area. What can happen with your tests is that they can be hyper-focused in one area of your code base to the exclusion of other areas. Does that mean to not use generative AI tools? No. It means, be aware. Know the limitations. Prepare for it.

Is Your Organization Prepared to Use Generative AI?

Is your organization ready to use generative AI for software engineering? My question to you is, is your SDLC already in pretty good shape? If it is, you might want to amplify leveraging generative AI. If you have some existing problems, sprinkling some generative AI on top, is probably not a good idea. Let's go back to the basics for just a moment. When I get parachuted into a new organization, into a new team, one of the first questions that I ask is, do you own your path to production? By asking that simple question, it gives me an entire waterfall of cascading other questions to ask.

If you can't make a change and understand quickly how it's going to get fielded, probably has some challenges. That's when I usually tell teams that we need to step back and start to do the minimums. In 2021, during the height of the lockdowns, I attended the DevOps Enterprise Summit with a number of different friends. It was virtual. If any of you attended, there are lots of different tools where you could belly up to the virtual bar. I bellied up to the bar with a friend of mine, actually someone who introduced me to Chris Swan.

My friend Brian Finster, and I, and six or seven other people were arguing and frustrated with one another. Why is everybody telling us that they can't use DevSecOps, that they can't have a CI/CD pipeline? Why are there so many excuses? You know what we'll do? We're going to write down what those minimums are, and we did, minimumcd.org. It's an open source listing, we simply are maintainers of documentation.

Providing people what the minimums are. What are the minimums? What do you need to do before you start sprinkling AI on top? Make sure you're practicing continuous integration. That means, don't leave the code on your desktop overnight. Tell the people on your team, don't leave the code outside the repository, check it in. If it's not done, that's ok, put a flag around it. Put a feature flag around it so that if it does flow forward, it's not going to cause a problem.

Once you check that code in, how does it get into production? The pipeline. The pipeline determines deployability. It determines releasability. How does that magical pipeline do that? Because we as humans sat down and decided what our thresholds were for deployability. Then we codified it into that pipeline. What else is involved? Once that code becomes an electronic asset, it's immutable. Humans don't touch it again. You don't touch the environments. You don't touch anything. Stop touching things. Let the pipeline take care of it. That's a big piece of DevSecOps principles.

It matters. It helps. Whenever you're doing any kind of testing, you want any of the other environments that you're leveraging to be at what's called parity, parity to production. A thing that you can do to get started is to take a look at the DORA metrics. Pick one, you don't have to pick four. Don't bite off more than you can chew. Deployment frequency is not a bad place to start. That QR code will take you to the research site. When you're there, you can also find another tool. It's a quick survey, I think it's four or five questions that'll help you decide which of those metrics to start to track.

Let's talk about the got you's as we're going forward. If you're adding generative AI into your workflow, your workflow is going to change. That means your measurements and your metrics are going to change. If you have people who are really paying attention and looking at your metrics and studying your measurements, let them know that things are going to waver and that you're going to have to train some folks. Be aware that if your processes were in ok shape, people have what I call muscle memory, sometimes they're resistant to change. Does that mean to not do it? No, just means some things to be aware of.

Let's talk about productivity. This drives me frigging batty, because it's perceived productivity that the surveys, that the current research, that the current advertisements are all talking about. You are going to have greater productivity. Personal productivity. It's perceived at this point, by and large, that productivity is a perceived game. It means I'm excited, I got a new tool. This is really cool. This is going to be great. It doesn't necessarily mean that I am dealing with higher-order issues, that I am putting features out at a faster pace with higher quality.

Doesn't necessarily mean that at all. It means I perceive it. We have to give time for there to be equalizing of the perceived gain to real gain. That leads to a really much bigger thing, we measure team productivity, not individual productivity. It's how well does a team put software into production? It's not how fastest Tracy do it alone, it's how fast do we do it as a team. If you're measuring productivity, and you should think about it, I recommend using Dr. Nicole Forsgren's framework. This came out around 2021, with a number of other researchers from Microsoft. What's important is that you see all those human elements that are there.

Satisfaction, we actually need to understand if people feel satisfied with what they're doing to understand their productivity. I met with Nicole, and we're talking about adding in another dimension, kind of throws off the whole SPACE analogy there. We're talking about adding in trust. Why does trust matter? If I'm using traditional AI and ML, and it's deterministic, I can really understand and I can recreate algorithmically, repetitively, again and again, that same value.

Think about a heads-up display for a pilot. I want them to trust what the AI or the ML algorithm has given them. I do that by proving to them again and again that it will be identical, that is the altitude, that is a mountain, you should turn left. Generative AI is by its nature, non-deterministic. It lies to you. Should you trust it? As things change, as we start to use generative AI, we have to understand, are we going to be able to trust it? That's going to give people angst. We're already seeing some beginnings of that. We're going to have to understand, how do we measure productivity going forward? I can't tell you 100%, how that's going to happen yet.

The importance of context. I love this library because this represents your code base. This represents your IP. This represents all the things that you need to be willing to give over access to a model. If you own the model, if it's hosted in your organization, that's a whole lot different than if you decided to use a subscription service. I'm not telling you to not use subscription services. What I'm telling you is to go in eyes wide open and make sure that your organization is ok with things crossing your boundary.

I deal a lot with InfoSec organizations, and we talk about the information flow. If all of a sudden, I say, I'm just going to take the code base to provide as much context as possible and shoot it out the door. You guys don't mind, do you? They mind. What I want you to take away from this is, read the popups, read the end user licensing agreements, read them. When I saw this, for just a moment I went, how do I flush the cache? It happened to be that I was using some training information, actual workshop code. If it had been something of greater value, I would have taken pause. Read those things. Read the popups. Be aware. Public service announcement, keep the humans in the loop.

The Big Picture - Adding AI to the Enterprise

We're going to talk about how we add AI to the enterprise. How do you add AI to your strategy, or how do you create an AI strategy? It doesn't matter if you're an organization that has two people. It doesn't matter if you're an organization with 200, or 2000, or 20,000 people. You may already have a data strategy, what matters is that you do a needs assessment. Don't roll your eyes. What matters is that you get some people together, perhaps you just sit around the table with some Post-it notes, and you talk about what might be a valuable place to leverage this, make a decision.

It's not everything at all times, not automatically scaling, which takes me to the second point, define a pilot. Make sure you have a limited focused pilot, so you can try these things out. What I'm telling you is that this has groundbreaking potential, and there are limitations and there are challenges. When you're going through that pilot, it's going to help you to understand the different types of skills that you're going to need in your organization, or if you're going to need to hire more people, or if you're going to need to bring more people in. It also helps you get after this first couple of tranches of governance.

Hopefully, your governance is, "Don't do it." No, your governance needs to be relevant and relative to what you are attempting to do. Monitoring and feedback loops, always important. I want to point out the bottom bullet that's here. It may seem a little strange to you. Why am I telling you that you have to have thought leadership as part of your AI strategy? I'm not talking about sending your people to get up on stage. I'm not talking about writing white papers.

What I'm telling you is to make sure that in your organization that you give dedicated time to more than one person to stay abreast and help your organization to stay on top of what's happening. Because it's a tidal wave right now. I somedays don't even like to turn on my phone or read any of my feeds, because I know what it's going to say, another automated picture generated from DALL·E. Too much. Choose when and where to start. How? Map it to a business need. Map it to a need. Make sure it's relevant. If your need is that you need to get some experience, that's fine. Make a decision. Write it down, architectural decision records. Then, make sure that you have some measurements against it.

Time to design your AI assisted software engineering tool chain. Why is it that suddenly we've forgotten about all of the software architecture principles, capabilities, and things that we've been doing for decades? Why have we suddenly forgotten about tradeoff analysis, about the -ilities? When you're designing your tool chain, apply that same lens. Is it more relevant for you to take something that's off the shelf, because you need time to market? What are my tradeoffs? It may be faster.

It'll be less tailored to my exact domain need. It may be less secure. That may be a choice that we make. It could be that I have the time, energy, finances, abilities to do the tailoring myself. Maybe I instantiate a model internally. Maybe I have an external service, but I have a RAG internally. Lots of different variations, but make those choices. Let's not forget about all the things that we've known about for all these years. Leading practices. I want to point out that we need to keep humans in the loop. Make sure that everything is in source code, the prompts, the model numbers and names that you're using it against. Secure your vulnerabilities and don't provide your private information into public models, into public engines.

This guy is on a tightrope. He's walking between mountains. Take a look at that? He's mitigated his risk, he has tethers. Is he doing something dangerous? Yes, but that's ok because he's mitigating that. I need you to think about 2023 as a year where we really didn't have a lot of good regulation. It's coming about. We're seeing that regulation catch up. There are challenges with IP. It can be that a model was trained with public information, and so you actually don't own the copyright to the things that you're generating, because it tracks back from a lineage perspective as something somebody else owned.

Or worse, when you've sent it out the door, even if it hasn't been used to directly train a model, let's say that they are keeping on your behalf all of your conversation threads, and that they're analyzing those conversation threads, and that they're taking IP from that, you can lose ownership of your IP. In the U.S., we have copyright law. Our copyright law says that a human hand must have touched it. It means I have to be really careful when it comes to generated code. What questions should you be asking to your providers, or if you are the people who are providing that service to your enterprise? In the Appendix for this, there are two different sheets of different types of questions that I want you to take home and I want you to leverage.

I'll give you one or two as a snippet. One, how are you ensuring that the model is not creating malicious vulnerabilities? What are the guardrails that you have in place if I'm using your model? Or if you're providing that model, how are you ensuring that that's not happening? If there's an issue with the model, and the model needs to be changed, how are you going to notify me so that I can understand what the ramifications are to my value chain, to my value stream? Questions to ask.

Looking Ahead

Let's look ahead. I'm not going to go into this slide in detail because it covers generative AI, it covers regular AI, it covers ML. What's important to know is that red arrow, where are we? We're at the peak of inflated expectations. We absolutely are. I completely believe that. I'm sure all of your social feeds tell you that as well. AIOps is on the rise. Other places, other types of AI and ML will continue to improve. We're at the beginning of generative AI, but we're well on the way with the others. What do you think it looks like over the next 12 to 24 months? Recently, I've had the opportunity to interview folks from Microsoft, from IT Revolution, from Yahoo, from the Software Engineering Institute, and even some of my colleagues within MITRE Corporation.

What we believe, what we're seeing is going to happen, is happening now, is that we're seeing more data silos. Because each one of those areas where a different AI tool is being leveraged is a conversation between me and that tool. You and I are not sharing session. We're not having the same experience, especially with those generative AI tools. For right now, for now, for this moment, more data silos. Data silos means slower flow. Slower flow often means more quality issues. It's going to get worse, before it gets better. It's groundbreaking potential, that we need to know the limitations and the risks for. There's going to be a continued increase for the need for platform engineering, because what are platforms for? Whether it's low-code, no code, or the new kid on the block that we're doing it for our custom developers, it's making it hard for people to make mistakes. It's codifying leading practices. This is going to continue to increase.

What about this guy? Any of you with adult children here are going to send them off to coding bootcamp? Jensen Huang would say, do not do that. The pessimists are saying that AI will replace the coder. The optimists are saying that those who are qualified as software engineers, and software developers will be in a great place. I want you to hear the nuances that are there. If you're good at your craft, if you understand the principles, if you're able to leverage those principles, if you're able to teach others, you'll be fine.

What about Devin? Have you heard about Devin, or have you followed OpenDevin that came out about 3 days after Devin was announced? It's fun to watch it. You see a little video. There's six videos on the site. It is saying that this is an AI software engineer. What they've done is a form of AI swarming, they have different agents that are plugged in, where one is triggering, one is reacting to it. There are different patterns. One is a coder critic pattern. Essentially those patterns. We're going to see AI go from being a tool that we independently and individually use, to agents that are plugged into our SDLC. When they get plugged into our SDLC, we're going to have to be cognizant of what that does to the humans in the mix. We're going to give them very defined small roles. You may have somebody on your team that is a GenAI. Not a Gen X, not a Gen Z, a GenAI.

I want to take you back to 1939. What's that have to do with software? It has to do with black and white. 1939 was when the Wizard of Oz was filmed. It started out as black and white. Dorothy's house is picked up by a tornado, and it is cast over the rainbow, and it lands in Oz. Smashes the Wicked Witch, and she opens the door. As she opens the door, she looks out at things that she has never seen before, munchkins, flying monkeys, an emerald city, all in beautiful Technicolor. Do you know where we are? Same Technicolor my friends.

The future is amazing. What we're going to do will be amazing. We're going to need to optimize differently. Right now, our software practices are optimized for humans. I limit work in progress. Why? Because I'm a human. Agile, I take one user story at a time. Why? Because I'm human. We're worried about cognitive overload. Why? Because we're humans. It's not negative. It's just a fact that we finally learned to optimize for the humans. As we go from having AI agents to having more capable team members, or perhaps teams that are made up of many different generative AI agents, we're going to have to figure out, how do we optimize? Who do we optimize for? Exciting stuff.

I'm going to take you from Technicolor back to where we are right now. I like to say, we cannot put the genie back in the bottle. Prompt engineering, we need to understand it as a discipline. We need to understand the ethics of prompts. Who owns the generated outcomes? Machine-human teaming. We need to understand all this. What about software team performance, trust and reliability? Why am I showing you a horse's backside? Because a friend of mine named Lani Rosales, said, "Trac, you can actually trick the genie back into the bottle, but you can't put the poo back in the horse." I want you to take that with you. We cannot go back ever to where we were. We cannot go back to where we were. That's ok. We can go into it eyes wide open, understanding the challenges and the limits that are there, and working together to figure these things out.

Call to Action - Your Next Steps

Your call to action. Go back and pulse your organization. Find out where the shadow Gen is being used, the shadow AI is being used. Bring it to the surface. Don't shame people. Understand how they're using it. Then enable them to do the kinds of research, or if they bring forward a need, that you help them with that need. Make sure you are looking at cybersecurity as your numero uno issue. Number one, establish your guardrails. Then, connect with your providers.

Use those questions or be ready to answer those questions if you are the provider of generative AI capabilities to your organization. That's your call to action. I need something from all of you. You're actually the missing piece of my puzzle. As a researcher, I want to understand, how are you using generative AI? How is your organization preparing? How are you personally focusing on getting ready for this? What are you doing? Share your organization's lessons learned. Tell me about your stories. Tell me about the challenges that you have. Or tell me about the things that you want to learn about because you haven't gotten there yet. This is in color. What matters in all of this is the humans. This is what matters.

See more presentations with transcripts

Recorded at:

Aug 27, 2024

Tracy Bannon
Software Architect and Researcher @The MITRE Corporation

InfoQ Software Architects' Newsletter

Login with:

Don't have an InfoQ account?