Transcript
Costlow: We're going to cover quite a bit in the realm of how you can both use artificial intelligence for yourself as well as integrate it into your applications to build things that you probably couldn't have built several years ago.
Zemlyanskaya: My name is Svetlana. I'm leading the innovation team at JetBrains. They are responsible for implementing and providing tools and different instruments to measure and try the quality of AI system features. I started at JetBrains as a Java developer. I've been working in this industry for more than 10 years. The last 4 years, I've been mostly focused on applying machine learning to improve development tools.
Selvasingh: I'm Asir Selvasingh from Microsoft. At Microsoft, I'm on point for Java on Azure, everything developers and customers need to build modernized AI applications on Azure. I'm also a Java dev. I started in 1995 with JDK 1. I've been having a lot of fun with Java ever since. I've been at Microsoft for the past 20 years. I've been fully focused on Java, particularly making sure our customers have everything on Azure to build those applications.
Katz: My name is Dov Katz. I'm with Morgan Stanley. I lead the enterprise application infrastructure group. We're focused on developer experience and the core programming language stacks used by various production applications in the company. I've been with Morgan Stanley for about 20 years, a little bit over that, in a bunch of technical and leadership roles. My hands-on programming started off primarily in Java, so I've got a rich history from building chat applications to trading systems to DevOps and everything in between.
Schneider: I'm Jonathan Schneider. I'm a co-founder of Moderne. Moderne automates large scale code changes that power strategic initiatives so developers can focus on feature development. About 10 years ago, as a member of Netflix engineering tools, I founded the OpenRewrite project, which provides rules-based refactoring, mass refactoring technology. Went on from Netflix to the Spring team where I founded the Micrometer project. In some ways, was part of the problem of making API changes that affected downstream developers that I'm now trying to help fix. A recent Java champion.
Costlow: I'm Erik Costlow. I'm one of the editors on InfoQ's Java team. I also handle product management for Azul, for the Azul JVMs, running something called Intelligence Cloud that helps people identify unused code in their applications.
When and Where to Use AI
AI is generally considered very new technology, so I think a lot of people are looking at it, but might not know when and where to use it. When new tools appear, how do people figure out what they can use AI for and what the particular problem domains are.
Selvasingh: The field of AI, the one that you just talked about, is very broad. Suddenly, it's an exciting time here. One of the coolest things in areas there are like, you can use it for AI assisted development, or AI infused applications, which we call as the intelligent apps. Intelligent apps are very interesting because they integrate AI to enhance the functionality and deliver amazing user experiences. For example, if you're using Azure OpenAI service as an example, you can use it in many ways. Some of the top use cases are, you can use it for content generation. AI helps to create amazing blogs, articles, social media posts. You can use it for summarization. You can quickly, accurately summarize pretty long amount of information for you.
Another exciting is the code gen. You can generate code where AI generates code snippets, automates many things. Additionally, the semantic search, it improves search results by understanding the context and meaning of the queries. There are some amazing, great examples. You've probably already seen many of those if you're on LinkedIn. There are many posts by many of the consumers of AI. I'll quote a couple of examples so you'll have some context. If you take Mercedes Benz, it uses AI, particularly on Azure, to create some connected driven cars. They modernize their software development, which allow them to update and release very quickly. It includes generative AI power in car assistant that enhances you as a driver when you're driving, it enhances your experience. There the considerations were like, connected products that they were able to continuously innovate, do real-time processing and apply responsible AI as well.
Another example I can quote is American Airlines, they modernized their customer hub. They implemented that on Azure with OpenAI as well. They handle millions of real-time messages and service calls, now by using AI, they reduced the taxi times. They were able to save fuel and give passengers extra time to make their next flight. There, the key considerations were like transactions at scale, making sure they're able to comply with all the standards, and distributed systems, and AI for automating many things. These examples show how AI is being used to create intelligent apps that drive amazing business outcomes, and it improves end user. If you are the passenger, if you are the driver, it's exciting. If you're a developer today, it's an exciting time to be involved in AI development.
Getting Started in AI Development
Costlow: As developers, one of the things we're targeting for this particular panel is Java engineers who are building Java applications, once we figure out what we're solving, we approach it from the context of grabbing a library, grabbing a particular piece of technology that we can use to play around with solutions for that piece of technology. What are some of the key libraries that people can use and just tinker with to get started in AI development?
Schneider: I've been very impressed with the breadth and documentation of LangChain4j, especially supports many different providers, and seems to be very current with most of the more recent techniques like function calling that work their way into probably the more well-known LangChain Python library.
Selvasingh: LangChain is a very good example. If you are a Spring developer, you can also use Spring AI. They've also made it super easy, almost like democratizing it, so any app developer can jump in. If you're using Quarkus, Jakarta EE. As Jonathan said, LangChain4j is amazing. Microsoft also has Semantic Kernel. It can help you in building amazing apps faster.
Schneider: If you're a Java developer, don't despair. It's not just for Python folks. There's a lot of work that's been done to catch up and, in some cases, surpass, I think, what the underlying libraries really do in terms of quality and breadth.
How AI can Benefit Java Engineers
Costlow: I tend to like Quarkus a lot, but I haven't experimented with any of the AI integration capabilities yet. When it comes to integrating AI into applications or working with some of these new techniques, I find a lot of Java engineers, the applications have been around for quite a long time, so we tend to be working on some existing applications, and maybe people are struggling with maintenance of that because they've built up quite a few years of technical debt. That maintenance of an application often takes up a lot of time that maybe we'd want to use for something cool, like learning AI. What can we talk about with the situation of how the Java engineers can really use artificial intelligence directly for themselves, not necessarily integrated into the application for the user, but how can we benefit ourselves as Java engineers?
Schneider: When it comes to migrations, as the founder of OpenRewrite, this is one of the key things that we've been working on. Development teams can't deliver enough software features because they're overworked by a never-ending list of technical debt, security findings that sap their winded capacity. I think interestingly, as we've started using AI in the net-new code authorship workflow, it's only accelerated this long-standing trend towards increased developer productivity and developing new code. Probably the most significant step function increase we've seen in that since the introduction of IDE rules-based refactors, 20-ish years ago. Now we've got just more code sitting on the shelf.
I think we always see OpenRewrite recipes as this highly accurate cookie cutter that can stamp out these low variability cookies, like you need a system that is going to make the same kind of change in a provably accurate way across the hundreds of millions of lines of code that we've accrued over time. At one banking customer, we did a single Java 8 to 17 migration just on one repo, and it affected 19,000 files. Imagine each one of those files having the potential for hallucination. We focus our energy on how we can use AI to help generate net-new recipes, instead of using AI to directly make the change. In other words, AI isn't the cookie cutter itself. It's the tool and die maker.
Costlow: For the things like porting from one XML library to another, since that was deprecated between Java 8 and Java 17, you have OpenRewrite recipes where artificial intelligence generated a series of step functions that people can use to do that automation and do that migration, versus having to go through and do that manually as people.
Schneider: That's right, yes. Then you want to deploy that at scale across tens of thousands of repositories to just eliminate that issue.
Costlow: Because, yes, I know it would take individual engineers quite a long time if we had to go into every file and change the imports, change the functions, and everything.
Schneider: Exactly. It's boring.
Costlow: Yes, definitely not on the list of exciting things that we can do, although maybe you get to do something else while the code is compiling.
Experience with Large-scale AI, and When to Use it for Code Maintenance
Dov, I know you did some of this work with maintenance using artificial intelligence. How did you go about making the decision of whether to use AI to help do it, or whether to allocate people to do it. Then, what were some of your experiences using the large-scale AI?
Katz: Back to the problem of the very boring 19,000 file change that needs to be made. There's also the error proneness to doing it inconsistently. That is another problem. When looking at the book of work that your hygiene office wants to see done in your software state over a given year, and you look at that versus the business features that they want you to implement. You start to say to yourself, we've got to find a better way to do this than go get a bunch of people focused on it, inconsistently by hand, and things like that. I really broke the problem down into two categories. One of those is refactoring. Maybe that's upgrading dependencies.
A lot of the OpenRewrite I put into the category of refactoring, although it's doing more than that, but it's trivializing the problem for a moment. The other one is replatforming. The code you want to keep you tend to want to refactor. The code you don't want to keep, you tend to want to replatform. Replatforming could be anything from changing programming languages to gutting out an entire implementation of an enterprise application integration pattern, maybe moving from something proprietary to Spring integration or to Apache Camel. It's not quite as simple as an upgrade of a Java version. For that, AI is great to get you on the right starting point. What we've actually done is we've leveraged generative AI to help us understand what we were doing so that we can redo it a different way. Tell me what my requirements are from looking at my code, and now implement those requirements in a different way. Is a great way to get started.
That compute doesn't scale to 19,000 files, and so on, it's very expensive. Let's go back to the rewrite drawing board and figure out what we did. Can we create a deterministic, repeatable, scalable way of doing it? Now I really have come to the conclusion that to solve this problem at scale, you need multiple tools in your toolchain. There isn't just one, and it isn't just going to get you 100% of the way there. You need to know how to combine the tools the right way. One good example is really around that. It's around using generative AI to imagine things you want to do and now go figure out how to do them at scale using some of the known and trusted ways of doing it. That's made a huge difference.
I was just running some numbers based on 15,000 code bases, trying to do that Java upgrade that Jonathan talked about, and it came out to about 50,000 hours of time saved, which is about 25 full-time consultants here. Just on a Java upgrade to 21. That's the boring stuff. Imagine if there's other stuff I actually want to do, like use AI, or any of that I need to get on these modern Javas first. Then there's also a bunch of other frameworks I need to upgrade. There are legacy things. That's before we touch JavaScript and moving to a new Angular and trying to get on new versions. CVEs are coming in, and you got to respond quickly. There's a bunch of different motivations for why you'd want to change your code base, and I believe people would want to be equipped with a combination of tools, know how to use them and when, in order to really tackle the problem at scale in a cost effective and accurate and predictable way.
Costlow: Did you say you saved 50,000 hours?
Katz: Fifty-thousand hours and millions of files changed. We were looking at the OpenRewrite approach, and nice part about the recipes is you can ask them how much time they're saving. It's built into the recipe architecture. It becomes very easy to make the case for how much time you're saving, if it spits out aggregatable values at the end. It's, change this many files, change that many repos, and it's, this is how many hours you avoided. Even if it's off by a factor of 50%, I'm good. It's still more time than I want my developers who are highly talented, spending on that problem. I'm sure if it's 90% inaccurate, I'm still fine. It's still time I don't want them spending.
Schneider: One of the things that's really scary is just how broad these changes can be. We're looking at the Spring Boot 2 to 3 recipe that now has over 2300 steps associated with it. Those are all the property changes and little idiosyncrasies, I'm sure we still don't cover it. That's the kind of cognitive load we're putting on a developer on a regular basis.
Selvasingh: When I'm listening to, hearing these numbers, what comes to my mind is GitHub Copilot. I know developers, they say they love it. One of the things that they're able to do in situations like this, they can focus on what matters, like go back to designing, brainstorming, collaborating, and planning. Much of these tasks like searching, documenting, finding problems, fixing it, upgrading it, writing tests, you can also easily think like when Jonathan is talking about these OpenRewrite recipes particularly for developers who are focusing on upgrading Java, upgrading Spring Boot 2 to 3, Jakarta EE, or then they're continuously doing that, that you can easily see the Copilot come and help them over time by bringing together the recipes and the AI assisted features into it. That way, it can help the developers. The developers can focus on getting these things done faster and then move on to adding intelligence to those applications.
Costlow: Svetlana, you've worked on some of these in JetBrains as well, some of the machine learning and AI assistant capabilities. What have you seen engineers using that for? What are some of the AI capabilities that you've worked on?
Zemlyanskaya: A lot of different features, and then since IntelliJ is a product with a long history and with a wide variety of users, we actually did have a research and asked people, what do they want to use, what do they want to do themselves, and what they want to delegate to AI? Surprisingly, one of the most popular choices was that they want to delegate test generation and writing commit messages, documentation. A lot of people actually didn't want to delegate writing code, because that's part of their passion. That's what they want to do.
Costlow: What is there in the way of AI commit messages, like you can just automatically summarize the commit and write a nice message for it?
Zemlyanskaya: Yes, why not? It still doesn't get information from your head. If you want to add information, not only what was changed, but why it was changed. In this case, you will probably need additional context, like the issue itself. That's work in progress, but that's where all AI tools are going to be in the nearest future. They're adding more information to different technologies, and they are able to summarize them in a dashboard for it.
Costlow: In terms of test generation, is there some reward system based on code coverage?
Zemlyanskaya: Do you mean reward for AI or for developers?
Costlow: AI generating tests for me is definitely a reward system for me. I was thinking in the way of the AI figuring out how to write the test so that it covers more of the code.
Zemlyanskaya: There is an approach, we're currently investigating that. We're not using it in production yet. Yes, we are checking, how can we do that? How can we apply that information? First of all, to measure the quality, because it is a tricky part, since with AI, with generating code, you can't just compare expected result with an actual output, because there are so many different ways to solve the same problem. One has to go a bit in a different direction and actually try to either execute code or write tests and execute them, or compute some other meta information about the code. If it's syntactically correct, how many methods does it generate, if you want to have something simpler or not. Test coverage is definitely one of them.
Large-Scale Code Refactor, and Testing
Costlow: In the case of things like a large-scale code refactor, how well does the code need to be tested? Because I talk to people on a regular basis, and there's always a goal of having 100% but nobody ever hits that. There's a goal of having 90%, people generally don't hit that either, but we have a moderate level of testing. If you do a large-scale code refactor, what do you need in terms of tests to be confident of that end result? Do you refactor the tests as well?
Selvasingh: I've talked to many Java developers and customers on this particular one. When they first start, typically, if it is a production system, let's say it's an inventory system or a stock trading system, they already have a set of tests. Whether they are running the test in a retail unit, in a store or in a warehouse, or in a truck where things are going, so they have all those tests. All those tests are pretty important as they move forward. They go through the refactoring, whether they are upgrading using the OpenRewrite recipes that Jonathan talked about. Whether they are breaking them down from monolith into microservices, making it easy to scale in the cloud, let's say they break all down, when they bring them all together, still those test cases are very important. When it comes to code level unit testing, that is where things change. They have to now generate new test cases based on the new code. When it comes to functionality, integration, compliance, all those test cases, they're all staying intact, and they have to go through all those testing before they switch from one production to another new production.
Zemlyanskaya: I think there's also a question, what do you consider a final artifact of your work? One may consider that it is a code which actually functions, and do you think that it's supposed to do? One can also think about tests as part of this artifact, because that's also documentation which you don't need to update on a regular basis. As everyone has read the documentation which was true a year ago, but it's not anymore.
Schneider: There's one other element to this question, I think, about gaining confidence in a large-scale refactor. One thing I think we haven't mentioned is that when an OpenRewrite recipe is running, it's based on a data structure called the lossless semantic tree. It's not just the text of the code, but also everything the compiler knows to resolve dependencies. Very similar to what's in the IDE, when the IDE doesn't index an operation, it knows more about the code than is just visible on the text. Some of the properties we've been adding to the LST are things like, is the statement covered or not?
More recently, working with Azul, it's been, is the statement reachable or not? A statement that's reachable has a higher risk score for a refactoring operation than one that isn't. A statement that's covered already has a lower risk score than one that is not. You can do a large-scale refactoring over a code base and actually calculate a risk score based on the combination of those factors. Is it reachable? Is it covered? Then prioritize maybe merging those repositories that have the lowest risk scores and focusing your manual review effort on the ones that have the highest ones. I don't think we'll ever get to a place where we have the same degree of confidence uniformly across the whole enterprise.
Code Modularization, Componentization, and Naming, with AI
Costlow: Is it possible to modularize code, componentize, and even name the modules? To take something that's large and break it down, use AI to figure out where the cut points are, so that you can break it into smaller pieces or modules.
Schneider: If it's about naming, I hope we train those AIs on Spring's naming conventions. That's famous. That's really long.
Costlow: Yes, and then abstract everything.
Asir, you indicated that you could break something down in terms of identifying different functions to where you could modularize things to a degree?
Selvasingh: Yes, I think that's true, whether it's naming or modularization. If you're using something like GitHub Copilot, it knows your organization's code base. That way, if you are a developer, it helps you onboard quickly. Then it offers suggestions based on your own internal and private code. This way you know the direction, and it'll break it down. As you move towards working in a collaborative environment, it also helps them stay in focus by helping them collaborate more effectively, whether it's reviewing pull requests or generating amazing summaries so others can understand what you did, and that way it makes it easier to manage the projects and maintain good quality code.
One of the things that Svetlana mentioned, is they may have written the documentation 10 years ago, but the code base is changing, so the Copilot can assist you with building that new documentation. It can come up with these suggestions, whether it's names, or modularization, or upgrades, or whether it's collaboration, all of those things will help you. The goal with all those helpers, it makes you super productive in achieving your backlog, so you can get to the plane where you are actually going to add AI infused intelligence into your apps, so then you can generate even more business outcome and deliver amazing end user experiences.
AI Documentation Generation
Costlow: A lot of engineers, we want to move forward, but we do tend to have these large-scale backlogs, so maybe we can't do that future thing that we want. Can AI help us understand legacy code by generating documentation? Do we have any tools that are geared towards documenting those materials?
Zemlyanskaya: I think that generation of documentation is one of the standard things to have on different AI systems. It is included in AI systems from JetBrains. I'm sure it is included in others. It's one of the things that people want to delegate to AI, and it makes sense.
Katz: I think there's definitely this view that you can back into requirements when you give AI some code. That has been a very effective means for a bunch of replatforming that we've done, experimenting with approved connections to generative AI, here is chunks of the project at a time, because obviously you're dealing with tokens and all the compute boundaries that tend to limit. You could do, but you start to go in that direction and you're reverse engineering for the sake of forward engineering afterwards. That's the extremely powerful just saying, what does this thing do? Go do it in Python. It's a Perl to Python play, where it may not just be, convert every method to an equivalent method. It's like, just do this thing better in a different language. That's not always a refactor, that's why you got to back into some English language requirements maybe. Definitely been an effective way for us to experiment with some of these approaches.
Costlow: Yes, every couple years, I see a new COBOL to Java translator come up to play. I don't think we've done that with AI just yet. That is definitely a thing that I as a human would not want to do.
Schneider: I love this so far in the space of this roundtable, we've now discussed AI replacing naming, testing, and documentation, probably three favorite developer tasks.
Costlow: Take all the stuff that you don't want to do and then make AI do it. That's the outcome of this webinar, is like, don't do the stuff that you don't enjoy anymore, make AI do that.
Everybody likes to make Terminator jokes, so maybe that's the reason. It's like, I don't want to do naming anymore in that code.
Schneider: Migrations.
Zemlyanskaya: Then maybe AI don't want to do that either, and it'll make us to do it again.
Selvasingh: It'll get bored?
Katz: There's definitely this use case that goes beyond those perhaps less interesting tasks that we want to avoid doing, where we've found other use cases that I thought would be worth mentioning. For example, if you have an architecture discipline and you're writing ADRs, architecture decision records, and you'd like someone to make sure those are actually following your architecture principles and stuff like that. You want to start getting into what would architects do in terms of giving you feedback. Pull requests, I know in the GitHub ecosystem, I'm expecting a lot there. Pull request reviews, usually in the concept you have a first line review, a second line review, and that, AI should just be the first line review. You should be reviewing AI reviewed code. There's no reason to look at a pull request review that AI didn't already review, in the future. I don't want to waste my time on it if it's obvious stuff. Those are great opportunities to employ AI in the SDLC, to give you a higher confidence level that it's worth your time. There are definitely just great use cases throughout the rest of the SDLC worth pointing out.
Costlow: Yes, because I've always done a lot with the realm of code quality checkers, like you've got PMD, you've got FindBugs, you've got a variety of those, and people are expected to run them, but then you commit that code. What you're saying is, AI reviews the code and provides context more akin to what a human would in terms of whether or not that meets the enterprise scope.
Katz: Just give me feedback. Just like, if I asked you to write the code, you would have written it for me as AI, give me feedback now that I chose to write the code, as Svetlana said, like we want to write the code, give me the feedback on the code I wrote, since I'm not letting you write it. There are probably good suggestions there. Sometimes there's a set of standards. Yes, you want your department to follow a set of standards. You're not following them, AI calls you out on it. Not everything can be written down in a fitness test that you can run. This might provide you with a bit more of a qualitative review. There are just these opportunities I think, that could be leveraged, and I'm sure we're going to see them in the products space that developers are typically using.
Schneider: I remember a principal engineer telling me once, he said, I feel like I'm going around telling my kids to pick up their socks all the time. I'm like, yes, I feel that. The moment you get everybody where you want them, like reorg or new product team comes in, or something, we need to make them feel like they're creating, not picking up, teaching sock pickup, or whatever you call that.
Selvasingh: When Dov was talking about the pull request reviewed by AI, I think the important thing there is when it's reviewing pull requests or writing those summaries, when the AI knows your organization's code base, and it knows whatever standards that you're implementing, then it makes it super easy. Developers can even run those offline before they even open the pull request. The AI can say, I've reviewed it. Everything else looks cool. Then the developers then come in and look at the review as a second or the third reviewer.
Costlow: It becomes the reviewer in the pipeline, versus just the generator of code that then humans review.
Zemlyanskaya: Also go further and ask AI to generate tests, then ask AI to generate code which will pass those tests, then generate tests which will actually fail the code that it's just written, and goes in this loop and tries to refine the results that we have. I have seen that a lot of research are working in this direction. I'm sure that we'll see interesting results in the nearest future.
Spring Upgrade and Managing Shared Libraries, with OpenRewrite
Costlow: One challenge I've noticed during the Spring upgrade is managing shared libraries across different projects. Is OpenRewrite the right solution for upgrading these shared libraries, or should we consider using AI to ensure that the updates don't break APIs and applications?
Schneider: There is definitely still a sequencing problem when you have shared libraries, that you have to merge those first and then move forward. I think that's where OpenRewrite used on one repository a time gets difficult. That's what we've been doing as Moderne, like allowing a recipe to run on a whole organization, like a whole business unit. It usually starts, actually, with an impact analysis. A recipe that says, this is the sequence or order of dependencies between projects, the merge order, essentially. Then you're running the recipe according to that order, and then merging them accordingly. It can be easy, I think, when you're running a recipe on one repo to just miss the forest for the trees. That's definitely a problem.
Going back to Netflix, this is related. Many years ago, that engineering org almost ground to a halt because of Google Guava, the famous guava. There were common libraries that used it and applications used it, and neither could move because they might break the other. Which could you move first? You can't move the library first, or it breaks the app. You can't move the app first, or it breaks the library. We actually built a series of recipes that eliminated just the API surface area that changed between these disparate versions, so that it didn't matter which. We effectively broke the link. Then you could move the library first, or you could move the app first, and then once it started going then everybody can move forward. It's interesting that that one library almost killed all forward progress just because of how many libraries had transitive dependencies on it.
Katz: I think, as you look at taking a large organization like a list of repositories and try to find a sequencing solution, with the focus these days on the software supply chain and SBOMs and all those other bill of materials type products, or files and formats, a company which manages to ingest a large number of repositories of its own probably knows the interrelationship between them to the point of being able to determine the tree. Then you can figure out, back to the question that was asked, what is the right sequence of the shared libraries? A clever way to break the linkages is also very much of value, but at least benefit from that institutional knowledge in a way that we can find the right order and roll things out in a least impactful, least damaging way.
The Integration of AI
Costlow: I want to move forward a little bit. We talk a lot about updating code of how people can use AI on their own. Now I want to talk a little bit about integrating AI services. Or now that you've solved your maintenance problem and you have more free time to be able to focus on cool business features and things that can enhance what you're doing, how can you take advantage of AI to do a little bit more. Svetlana, the product you worked on is a little unique because it uses AI on the backend, but the thing that you make available is actually AI to the engineers. Can you talk a little bit about how integrating AI into your application works? What's local on the machine? What still needs a backend service? What should we know about putting AI into applications?
Zemlyanskaya: First of all, AI is a very polarizing topic, and a lot of people are eager to try it and see what it's capable of. On the other hand, there are a lot of people who are trying to avoid it and wouldn't want to use any AI test, or any product with a lot of AI integrated. Both of these points make sense. AI does blur this line between code ownership, who's actually a company or a person who owns the final artifact of your work. What's crucial is to know your audience, to understand, who are you aiming to, and if these people actually will be happy to try it, or they will be disappointed. Basically, developing all things that they used to do would be less fun and less enjoyable, because parts of that will be delegated to AI. In case of IntelliJ, it is a huge product with a lot of different customers. We had to actually go in different directions here. For customers who value privacy the most, and who don't want to, or their employees don't allow them to send code to third-party servers, we do have completely local solutions. Third part is, it's tricky. You have to think about model size.
You have to think about consumed memory, at least right now. It might be changing in the future, but at least at this point, it's not possible to have general-purpose AI, which can do state of the art things, which you can do with ChatGPT. It's just local machines and local laptops are not there yet, usually. What is possible to do is to change the scope of the task, to make it smaller, and to make your model to solve one particular task, or maybe a couple of them. In this case, it's completely doable to train something from scratch or tune open-source models. It will be on par with another server-based implementation. At this point, we have two major features to leverage that way. It'll have a full-length code completion, which is also available in Java, and it's actually included as a JetBrains product. There is no additional running cost for us, because it's already happening on the machine. It is included in regular subscriptions. One doesn't have to pay any extra cost to have that feature. The second one is semantic search integrated in search everywhere. It is safe and embedded and it's also a completely local solution.
Yes, on the other hand, it's still not possible to have everything locally, or at least one has to do a lot of additional work on top of that to get there. We're also exploring server-based applications, and in this case, we would want to move as much logic as possible to the server because it will make it easier for us to iterate and update this logic and change models or model providers. It's not possible to make the client completely sane, but we still have all of that context. Context is what makes the actual difference between regular and using AI, as-is, just with changing prompts and making something which is adapted for a particular person for a particular project.
As of now, this differentiation comes to everything which is about a particular person who's working with AI happens inside IDE. Things that can be shared between multiple users, it usually happens on the server. For instance, UI and UX, it's obviously on IDE part, but also gather in this context which parts are relevant, technologies used, or languages, or methods which are relevant to a selective code fragment or natural language query and so on. That stuff happens inside IDE. There is also a post-processor stage. How do we apply the code which was generated to an open file? It's also inside the IDE. On the server side, we did have more integration with language model providers. We are selecting proper models and prompts. We are forcing for the inference for in-house models of business logic on top of that. There are indexes for documentations that we have and that we want to make accessible from an IDE.
The Depth and Breadth of AI Knowledge Needed for Java Devs
Costlow: Asir, you're providing a lot of AI services, like Azure has a lot of things on the backend for people to use in their applications. For a lot of Java engineers, we know things in Java, we tend to learn things that are tangential to them. About how much information does somebody need to know about artificial intelligence, and how much can they focus on using the library or the service?
Selvasingh: How much do they need to know? That's a very important thing. I will start by saying that if you are a developer, you can start today. That's very important. Every app will be reinvented with AI. There'll be new apps built that were not possible before. The opportunities are super high. When we talk about, you can start today, there are different personas playing a role here. Different people, they play a critical role in building these intelligent applications. I have three: AI engineers, data engineers, and app developers. I'll say a few words about this and then go into a focused area here. AI engineers, they focus on developing and deploying AI models. Data engineers, they manage and process data with data scientists and machine learning engineers playing pretty critical roles. Data scientists, they explore data, discover insights, build the model prototypes.
Machine learning engineers, they train, fine-tune, deploy models, ensuring that they are ready for production. You have these AI engineers and the data engineers, however, the spotlight is actually on app developers. Because our customer organizations, they need a lot of app developers to infuse this intelligence. App developers, they integrate AI into applications to create these amazing intelligent solutions. They build all the user interfaces, application logic, and they leverage the AI models. This ensures that the AI capabilities are seamlessly integrated, providing a smooth and intuitive, amazing end user experiences. What happens in this case is, three things are coming together when these three personas play a role.
There's an app platform where you're deploying the apps. There's data platform, where the data is brought in together. Then there's the AI services, so the three things come together. If you are a Java developer, you can start today. If you're a Java developer, then this is the best time for you, because lots of enterprise applications are in Java, and you can start today. You can start today, integrate into your apps. I want to re-echo, if you are a Spring developer, you can use Spring AI. They made it super easy. If you're using Quarkus, Jakarta EE. Other systems, you can use LangChain4j. Microsoft also has Semantic Kernel to assist you in building these applications. The key thing is you can start today, and you can work with different personas and move things forward as fast as you can.
Zemlyanskaya: I do have a couple more tips for those who do want to integrate AI into their application. There are different things to keep in mind, but from my point of view, there are two major ones which actually makes a difference, because there's a blessing and a curse of working with large language models that it's very easy to get to a working prototype. You can just spend a couple of hours and you already have something impressive. You can already show a demo and, yes, get everyone excited. The next part is actually going from a demo to production level. Quality is a very complicated one, and a lot of products don't make it. That part is actually hard to predict. Sometimes it's hard to say how difficult that is, how much time we'll get, and if it's actually possible to reach proper level.
What helps here is, think about creating a data-driven loop around your feature from very beginning, kind of like a test-driven development where, when you start working on functionality, you think about, how should it look like, what should be expected input and output. In this case, we don't expect that all of the tests will pass. It is fine. It's always the case, not everything will succeed. That's all right. It allows you to actually measure. If you want to fix one problem, and you tune a prompt, and you fix it, you're making sure that you're not breaking several other things along the way. Because with using a prompt, sometimes you can just change a double quote to a single quote, and it will blow your application, and it will change formatting and everything it returns.
It is important to check it and make sure that your consecutive step doesn't make it worse in unexpected way. It will also allow you to try different models, because in AI, a lot of stuff is happening very fast. New models arrive, but also old models are getting duplicated with pretty short notice. One can't stick with one model and say that it is enough, and have to update to a newer version. If it's easy to do, if that process is semi-automated, it's not that complicated. You don't have to spend a lot of hours doing it. The second part, which I would say is very important, is to use your experience, and how do you present those results? Because, as I've said, you can't achieve 100% accuracy, and that's all right, as long as it is built in your overall user experience. Have to set expectations right, so a person who will be using your application don't expect that these suggestions will be always perfect, but it should be also possible to tweak them or maybe reject the suggestion. It shouldn't be forced to a user. It should be more like a guidance, not something which forces a process.
Keeping Up-to-Date with Different AI Models
Costlow: How do you keep up to date with the different models that you have. Like you said, some of the models go out of date.
Zemlyanskaya: We do have a couple of our in-house models, but for most of the controls, especially state of the art part with the chats, we are using external versions, model providers like OpenAI or Google Cloud. In this case, they do release new models, but they also duplicate older ones. What we have here is we do have a process to evaluate new models, so when something new arrives, we can check it against our benchmarks and see how it works in terms of metrics, but also, what are potential problems that we get. Maybe some of the functions worked well with the previous version of the model, but it doesn't work that well with a newer one.
Choosing the Right AI Models
Costlow: As a Java engineer, or somebody who's working on an application like Asir, you were talking about the key time to be an app developer, how do you look at and evaluate which models you're going to pick to solve something. What goes into that?
Selvasingh: This is where your AI engineers can help you. You have to start from your business requirements, figuring out what you're trying to solve. Come up with those requirements, and then go into an ideation phase to figure out which model can help you. In the Azure space, we have this AI Studio. Within the AI Studio, you can try out models. You can run your ideation workshop with your fellow engineers, or with your user base as well. Start experimenting which ones will help you, whether you're going to use one of the pre-built models, or are you going to fine-tune them? If you're fine-tuning where the data comes. You can do all those ideations, experimentation, figure out what works before you bring it into the application itself. When you finally bring into the application, then the focus will be on what is the end user experience and how you deliver that.
For example, if you're building a retail store, your key concept there will be how to fill the cart and how to help them place the order, or how to augment the cart, add more items into the cart. How do you use the model, and how do you build that experience so that you can help them fill, augment the cart, and then close the cart? Because sometimes those end customers will jump out of your application and look somewhere else. The moment they look somewhere else, they probably go into another store online. How do you build those experiences in a way that you can retain the engagement of the end customers? That's where you go back to your Azure AI Studio. Access to all the models are available there, and new models are coming through. You can run your ideation workshops. Start experimenting with your end users as well. Then you can bring those experiences into the applications and deliver them to your end customers.
Costlow: Using your cart example, I could run simulations of carts through it and look at what the suggestions it had were, and that's how I would pick which of the models that I want, what's based on the one that gave me the best results.
Selvasingh: It also goes back to your business requirements. Your requirements should be like, how many items you're able to add more? How many of those customers, like 100 customers coming through the process, did 90 of them place the order, 80 of them placed the order? You can come from that and reverse engineer and then pick the right models from here.
Costlow: I have a set of success criteria of, I need to increase my number of conversions, so let me pick an AI model that, for the time being, increases those conversions. Then when I have conversions, I figure out, how can I add more things to the cart? Which of these AI models helps me get more things into the cart? Then I can build my application with the Azure AI Studio and things like LangChain4j.
Selvasingh: Nothing has changed in this process. You always start with your outcome, your requirements, use cases, and then pick the technology. Because companies like Microsoft will give you a plethora of technologies. It's not like you have to use all of them. You have to pick what meets your requirements, what takes you to success, what helps you make more money.
Schneider: There's one other characteristic I've noticed when selecting models, and that's that, we generally don't just select one model for an experience, but often wind up selecting multiple and developing a pipeline by which we use small models to limit the search space and therefore the compute required. As an example, we developed a feature to do recommendations for which recipe to run on a code base. If I just give every next 10 lines of code to a model and say, what do I do here? What do I do here? We had done this study, what if we did this for an average sized generative model, and it implied 75 years of total induced latency, for the average size business unit? How do we limit the search space so we don't have to send every fragment of this code base to this model?
Instead, we developed a pipeline where, first, we just take every method declaration, embed them, cluster them, and then select samples from the cluster, so that we're not asking effectively the same question over again for highly similar methods. Those embedding models are much cheaper, much smaller, oftentimes 1 to 2 gigs of RAM. You can run them locally. You can run them at zero marginal cost. Then reserve the more sophisticated model for the generative part later on. Svetlana was mentioning that earlier, like you can really choose small models that are good at individual small tasks, and then just reserve the big guns for the later stage, somewhere down, after you've narrowly filtered the search space. Wrote a whole book, actually on this recently, just a short, 35-page book. It's free, from O'Reilly, called, "AI for Mass-scale Code Refactoring and Analysis", where we describe in great depth, like how we select those models, how you deploy them, how you evaluate them, and hope that's helpful to folks to look at.
Reverse Engineering, from Code to Use Cases
Costlow: Can we reverse engineer from code to use cases?
Yes. We talked about automated code refactoring at scale. Yes, you can do that.
See more presentations with transcripts