Live from the venue of the QCon London Conference we are talking with Cassie Breviu. She will talk about how she got started with AI, and what machine learning tools can accelerate your work when deploying models on a wide range of devices. We will also talk about GitHub Copilot and how AI can help you be a better programmer.
Key Takeaways
- To operationalize transformer models on the edge you can use the ONNX runtime to easily work with different devices and programming languages.
- Hugging Face has a lot of prebuilt machine learning models which are task-agnostic and can be retrained for something more specific.
- Github Copilot can speed up your development process by intelligently auto-completing your code.
- Microsoft partnered with Epic Games to create the Neural Network Inference plugin. This plugin uses the ONNX runtime to run ML models in the unreal engine and in the game.
- Cassie started learning machine learning by learning exactly what is necessary to automate parts of her job. The best developers are lazy developers.
Subscribe on:
Transcript
Roland Meertens: Hey, everybody. Welcome to the InfoQ podcast. My name is Roland Meertens and today, I am interviewing Cassie Breviu. She is a senior program manager at Microsoft and hosted the innovations in machine learning systems track at QCon London. I am actually speaking to her in person at the venue of QCon London Conference. In this interview, I will talk with her on how she got started with AI and what machine learning tools can accelerate your work when deploying models on a wide range of devices. We will also talk about GitHub Copilot and how AI can help you be a better programmer. If you want to see her talk on how to operationalize transformer models on the edge, at the moment of recording this, you can still register for the QCon Plus Conference or see if the recording is already uploaded on infoq.com. Now onto the interview itself. Welcome, Cassie to QCon London. I'm very glad to see you here. I hope you're happy to be at this conference. I heard that you actually got into AI by being at the conference.
Entering AI from a non-traditional background [01:16]
Cassie Breviu: Yes, that's exactly right. It's an interesting story. Thanks for having me. I am thoroughly enjoying this conference. It's really put together really well and I really enjoy it. So what happened was I was at a developer conference. I was a full stack C# engineer and I'd always been really interested in AI and machine learning, but it always seemed scary and out of reach. I had even tried to read some books on it and I thought, "Well, this might be just too much for me or too complicated or I just can't do this." So I went to this talk by Jennifer Marsman and she did this amazing talk on, Would You Survive the Titanic Sinking? She used this product that's called Azure Machine Learning Designer. It used to be called Studio, but it's called Designer. It's still a product that you can go use.
What it does is it actually abstracts making a machine learning model to drag and drop widgets, and so you import a data set and you drag and drop that in, and then you do all your pre-processing and there's widgets for that. Then you choose your algorithm and you set all your parameters and then you hit run and it just runs it. What she did for me was show me the beginning to end process of how you build a machine learning model and it blew my mind. I had something that I knew I wanted to build at work. I was working at a law firm at the time, and I was doing in-house development for different business processes automation.
Roland Meertens: And this is just normal encoding, so no AI or-
Cassie Breviu: None.
Roland Meertens:... this is data science already or-
Cassie Breviu: No data science. I was doing C# and SQL. A lot of times, I was building application from database to front end, the whole thing, because it would be to solve a specific business task. One example was a lot of times, they had to run large queries on the SQL database to change a lot of different client information. Somebody would have to go write that query and they didn't want to have to wait for that, so I built an app that actually would dynamically create the query based on the user giving you information. Like if you think about how Azure DevOps is where you actually can write your queries, but you use the UI to say less than or greater than or equal. Then what it would do is it would take what they would put in, because we were talking about people that don't understand query languages, and it would put it out into a sentence text example. Anyways, that's an example of one thing that I would do that would allow basically non-technical users to run large queries to change databases.
I was doing that type of work, so nothing to do with AI, static, traditional programming. But one thing we had to do there was we had to triage help desk tickets, and it was very much a disruption in our workflow. I didn't really like that. We had a help desk person and if it was something that went to the development department, it would just get sent over and about one week every month, we would have to be the person that triage that. I didn't want to do that anymore. After I went to this talk, I think I could automate this with machine learning.
Roland Meertens:So after you went to a talk about, Would You Survive the Titanic?, you thought, "Oh, I can totally take arbitrarily written help desk tickets and figure out exactly what's going wrong."
Cassie Breviu: Yeah, and I did.
Roland Meertens: That is good. Very ambitious.
Cassie Breviu: What she did though that made it not seem so ambitious was that she demystified the process, the data science process. She explained it so well from beginning to end that I understood the process I needed to do. Now, there was still a lot of things I did not understand. I'm very much was going into unknown territory, but I thought this would be the perfect thing to do. For me, I always learned by doing. So for me, it's like I need a goal and I just don't give up till I get there. Then once I get there, I realize I learned a new skill and that's just how I've always been and how I like to learn. I first needed to get access to the database, obviously, because you need to start with data with all models, which I learned in her talk.
What I ended up building was a model that could take the text of the ticket and then different attributes that you would select in the help desk system. Then I took a bunch of obviously, historical data, cleaned that up, and I built a model that would classify who it should be assigned to, the top three assignees based on the text and attributes of the ticket. Then it would send an email. I can talk about some of the difficulties there. One of the things was, although it was a system that we hosted in-house so I had access to the database, I actually didn't have access to the source code of this. So in order to hook into the event of a help desk ticket being created, I actually had to set up a SQL trigger. I know some people are probably going to cringe at that, but I just set up a SQL trigger that ran every five minutes and just said, "Do I have a new item?"
If I had a new item, then it would kick off the workflow that would actually do the inferencing and say, these are the top three people that are most likely going to be assigned to this ticket. At this point, I was actually doing natural language processing and I didn't even know what natural language processing was. I didn't know that I was doing that, but I had figured out how to do that. I had done it within the system. Then people started getting emails from the machine overlord. That's where I sent the email from.
Roland Meertens: That's a perfect name for a-
Cassie Breviu: I also didn't really tell anybody I was doing it because I just wanted to see if I could do it. So it was a surprise to some people when they started getting these emails saying you have help desk ticket and it was like, "Sincerely, the machine overlord."
Roland Meertens: Did they appreciate it, machine overlord assigning them tickets?
Cassie Breviu: Somehow people figured out it was me. They're like, "This is probably Cassie." But yeah, it ended up working so well that we didn't have to triage tickets anymore. Then I also set up a Teams channel. So I posted a Teams channel because one of the other problems is there wasn't really a good way for us to all see what was happening. So now there was a Teams channel that would post the text of the ticket and then the top three probable assignees. They would get an email, but then also, people could comment on it even if they weren't in the email and they could see what was happening. So it actually just ended up being a better way to do it. That's how I got started, but now, you'll notice I was using this tool in the cloud. I wasn't building models with Python. I didn't even know Python at the time.
Roland Meertens: Please stay away from anything I'm working on before you automate it.
Cassie Breviu: Yeah. Exactly. Sometimes it bites you. No, I'm just kidding. Once I presented this to my managers and they're like, "This is really cool," but we have other ideas of things that you could build, but we don't really want all of our data on the cloud. The thing with law firms is they're very particular about where their data goes. So they're like, "Can you rebuild this, but we need it locally?" And I went, "I just had this tool on the cloud that I use. I don't know how to do it." They're like, "Well, take a couple of weeks. See what you can figure out." So I started reverse engineering what I built and building it in Python.
I looked on Coursera and some of the online courses and I found this applied AI course where they started teaching the basics of Python and some of the scikit-learn stuff. I basically reverse engineered what I built on the cloud with Python and built it so I could run it locally. That's how I built my first AI model, then how I switched from doing this high-level UI process of building a model to now switching into, oh, I'm writing Python. I'm able to now do a lot more things with this. So that's how I got into it.
Roland Meertens: It's really about replacing your job with Python, but then actually starting at replacing your job or replacing your job with Python.
Cassie Breviu: You know the quote, "The best developers are lazy developers?"
Roland Meertens: Yes
Cassie Breviu: So basically, you could say I got into AI because I was lazy. I just didn't want to assign help desk tickets anymore, and now I knew how to do it.
Roland Meertens: Yes, perfect. Especially with because you will take these NLP projects, which are normally, they can be very daunting and very difficult, especially because your data can be representing, I don't know how many shapes. So it is really awesome that you managed to get that. How did you actually start getting into coding? Because I think you also said that that was interesting.
Cassie Breviu: I feel like I had a little bit of advantage because I used to be a data analyst, and so I worked in Excel a lot and I was very comfortable with data and working with data. I used to do tons of Excel macros and that kind of stuff. This was before I even got into traditional development, but this is how I got started. Again, I had to do these eligibility audits and I didn't like doing them. So I started figuring out how to write really long algorithms in Excel. Then I discovered this Record Macro button in Excel and so I was like, "What does this do?" So I said it and I started writing my formulas and then I hit view source and I saw a code. Then I ran it and it automatically did what I just did. I was like, "What is this magic?" I need more of this. So I just kept going and I eventually automated half of my job with Excel macros.
Roland Meertens: Fantastic.
Cassie Breviu: I didn't really know what I was doing. I was very much finding information online, pasting it in. I didn't understand what a class was and the object oriented things that I was doing, but I understood how to make it work. Then eventually, I was like, I think I want to move into tech because I really liked what I was doing in the manual work.
Roland Meertens: Please stay away from anything I'm working on before you automate it.
Cassie Breviu: Yes, which is funny, which we'll talk about later because GitHub Copilot, we're not automating with coding, but we are still now moving into the place where we're going to make coding easier for developers or more productive. So yeah, I think that honestly, a lot of it just came down to tasks I didn't want to do and I just thought there was probably a better way to do them. Then I did eventually do classes as well and got a computer science degree at night while working full time. Then I eventually switched into a help desk role. I wanted to get into just around tech, so while I was in school, I did help desk at a small company. Then I also was a business analyst and did designing the UI and UX stuff. Then from there, that's when I finally moved into my first development role.
Roland Meertens: So what are you doing right now then? What are you trying to automate? What are you trying to build?
Portable ML models in production [10:04]
Cassie Breviu: Well, now that I'm full into the machine learning space and I've been here now, I think when I started this, it was probably five or six years ago and I moved into focusing on artificial intelligence and machine learning maybe three and a half years. I finally switched and was like, "Okay, this is my full-time job," because before that, I was still mixing both. I was doing C# stuff and I was doing machine learning stuff, and so just whenever it was applicable to something I was building. So I've been focusing on it for three and a half years now. At Microsoft, I actually work on the AI frameworks team and I work with a product called ONNX Runtime. ONNX is a format for exporting machine learning models that you can then make work in a lot of different platforms and it makes it very portable and interpretable, which is nice.
It also improves performance and then it gives you that capability of deploying to multiple platforms and inferencing in multiple languages, which had actually really helped me before I even started working in this, because I was doing C# development, but I was doing Python for my data science. The C# place was like, "Why are you using Python? We should be doing this in C#." I'm like, "Well, at the time, there wasn't any C# machine learning libraries. Now there is. There's TorchSharp, ML.NET and there's ways you could actually do it, but at the time, there wasn't. So one of the challenges I had was that I needed to figure out how to deploy my models, but I had to use C# and ONNX Runtime actually fulfills that really well.
Roland Meertens: I think this need one big problem that at least the data scientists are all using Python, but once you start deploying on a self-driving car, you need C++, or when you deploy it on a more traditional stack, you need Java. So yeah, it makes it really portable, but maybe we can talk about it later for a bit. You actually had a talk yesterday about using transformer models. Can you maybe give a short summary of the talk so that people who are interested in listening to this podcast can also check out your talk?
Cassie Breviu: The talk was how to operationalize transformer models on the edge and basically, the main purpose of that is to understand how to take large transformer models, which right now, I know you've probably heard of GPT-3 or GPT-2 and BERT, they're very large models. They're very powerful and amazing state-of-the-art natural language processing models. But with large models come a lot of challenges when you want to make things that are really useful and you don't want to spend a ton of money on a GPU. So what we go through in the talk is we look at a distilled BERT model that Microsoft put on the Hugging Face Hub. Then we fine tune that with an emotions data set from the Hugging Face Hub that allowed us to get a very well working sentiment analysis model.
But we still wanted to be able to deploy that to a browser and using JavaScript. We wanted to do inferencing in JavaScript and we still needed it to be smaller because we wanted to inference on device. There's a lot of benefits to inferencing on device. It's quicker. It works offline. It's cheaper because you're not paying for a server. That was what we wanted to be able to do, so we went over how to do that and then how to operationalize it through the deployment process.
Roland Meertens: Well, I think, that was especially nice about your talk. So if people are listening and thinking, "Okay, should I watch this talk?," what I really like was the high level, how do you operationalize this? How do you get it into deployment? What are the challenges there, but also the lower level, just practical code examples on how to get started? So yeah, I think I can recommend anyone to look at this talk.
Cassie Breviu: Thank you. Yeah, I tried to put something in there for everybody. For the more ML engineer or ops engineer, they're going to see the big process of how things come together, but we also go into Jupyter Notebooks and are fine-tuning a transformer model and optimizing it and using things like ONNX Runtime to quantize it to get it small enough to work in the web. I tried to put something in there for everybody because I think it's interesting to see the whole picture, and then also some people might be like me five years ago and get really interested in, "Wait, how do these Jupyter Notebooks work" or "That's how you fine tune a model?"
Once you realize it's not as scary I think as many people think, and in my opinion, data science is software development and it's really the two disciplines are going to come together and it's going to be one more tool on your tool belt, particularly with how many prebuilt models are out there and how fine-tuning is just becoming more and more the status quo rather than building something from scratch with your own data. You just can get such better performance using things like transfer learning.
Roland Meertens: Yesand especially this model hub, which you showed. It really helps you. If you're having a new problem but you are not working with big data yet or you don't have a lot of compute available, you can easily get started. Download one of the most applicable models or you have access to a lot of the features and a lot of the weight you want to use.
Cassie Breviu: Right? Getting data can be one of the hardest parts in building machine learning models and getting labeled data. With the sentiment analysis model that we're using, we're getting labeled open dataset from Hugging Face called the emotion dataset, and so we're able to get really good performance. But even if you don't want to fine tune your model to a specific task, there's so many prebuilt models out there that you can just go use. We have the ONNX Model Hub where we have lots of different types of models that you can just go grab and use and it'll give you sample codes so you know how to pre-process and post-process with that model, and then you can grab that and deploy within your application. I really see a lot of machine learning going there. There's a lot of very common tasks that have been solved well and there's no reason to build it from scratch. Go use one of the ones that has been built to solve that task already.
Roland Meertens: So the tasks which you showed that you can also use it to train the next model, right? So you already have all the existing weight or the knowledge of the neural network embedded?
Cassie Breviu: Exactly. Then I know Hugging Face also has a lot of prebuilt and community models. One of the things I think is cool is you can go grab some of these base models that are task-agnostic and you can then retrain them to something more specific, but then you can then push that to the Hugging Face Hub and now someone could grab the base and train what they want or they could just grab one that's already customized. So the whole open machine learning and the things that Hugging Face are doing, I think is really cool, and that the ONNX Model Hub is one that I've used for lots of projects as well, like a style transfer model.
Github CoPilot [15:41]
Roland Meertens: That gives you just an easy start. But yeah, that sounds great. If people are listening to this, check out your talk. It'll be online soon. Something you said before was automating away the programmers. Are you working on GitHub Copilot or working with GitHub Copilot?
Cassie Breviu: No, I use GitHub Copilot. I'm not working with it directly, but I do know about it. It's a really cool model. It's based on GPT-3 that was fine-tuned for coding with GitHub data to create a model called Codex. That Codex model is what powers GitHub Copilot. What I think is so cool about AI as a programmer, if you think about we have these different tools like IntelliSense and JetBeans make some really cool productivity tools as well, where we're already using a lot of them to help us be more productive, but we still have to write the code. So what GitHub Copilot to me is it's just like the next iteration of computers helping us be more productive, because you still need to understand the code that comes out. But what's really neat, it changes your flow. I've even found myself doing this where I write a comment of what I want to happen and then GitHub Copilot will suggest what it thinks I need to do.
Roland Meertens: What is your flow right now with GitHub Copilot? For people who don't know or never used GitHub Copilot before, how do you then write code? Do you just write at the start, "I want a program that does all these things," and then it just does it or-
Cassie Breviu: I just told it to build my social network and it did.
Roland Meertens: Okay. Now you're an owner of a Facebook. Very good.
Cassie Breviu: Yeah, exactly. No, it does not work that. The same way with voice assistance, you have to know how to ask the question. It's the same thing with Codex models and GitHub Copilot. You have to know to put the right words in of how to ask the question to get the right answer. You have to still understand programming and you need to understand what you're writing. You don't just want to put things in there that you don't understand that are happening. It actually works with Jupyter Notebooks and VS Code. I do a lot of my programming in there now. What I usually do is I tend to start writing out what I want or maybe I'm starting with a notebook from Hugging Face or GitHub and then augmenting it to what I need.
But what I end up doing, if I want to use Copilot, is I put a comment in and I just say what I want it to do rather than when you might go to Stack Overflow or Google something or Bing something and try to find that answer and be like, "Oh, that's what I need to do right." Because there's so many memorization things. We can't memorize all of the things. We all know that we use Stack Overflow and everything else, but it replaces that in some ways.
Roland Meertens: What I noticed very frequently is that when you describe your problem, it gives you a piece of code. If you Google it, the top Stack Overflow answer gives the same code.
Cassie Breviu: Really?
Roland Meertens: But with a normal Stack Overflow answer, you still have to copy past your variable names in and GitHub Copilot just puts in your variables in exactly the way you want. It's way faster than Googling something separately, trying to see how can I adjust this to my code, because Copilot does this for you. It's so amazing. It's so fantastic.
Cassie Breviu: It is so cool. I even had this one experience. Last week I was writing something and you know when you're writing a program and you know you want to add a function somewhere, so you put it to do for yourself. So I put it to do because I was like, "Oh, I'm going to put what to do of what I want to put here," but I wanted to finish my thought process of what the other thing that I was building. We do that all the time. Well, I wrote the to do and I hit enter and Copilot wrote the function for me.
Roland Meertens: Oh, nice. That's perfect. Fantastic.
Cassie Breviu: I hit tab because to accept it, you hit tab. And I was like, "I guess I don't have to come back to write this now. This is so cool." When you think about things like that and you think about AI as your pair programmer, you still need to understand what you're doing. You still need to understand functionally how to build your program and you still need to understand the code that's coming out. It doesn't replace us. It makes us more productive and it changes your workflow, which is really cool.
Roland Meertens: I think a couple of months ago, I wanted to sort a list with tuples, with integers high to low, but then there were also words, but then they had to be sorted low to high. I tried a few things. I couldn't do it. I'm just like, "Oh, but wait. I've got Copilot." So I wrote out a problem what I wanted and it just perfectly autocompleted it. It just solved this problem, which I had been staring at for 10 minutes like, how can I do this?
Cassie Breviu: Awesome.
Roland Meertens: It's so fantastic.
Cassie Breviu: I feel like it could be a cool learning tool for students as well as long as they take the time to understand what's happening and not just accept what's in front of them. That's my concern, is you still need to understand what you're writing.
Roland Meertens: I also had a couple of times that it introduced a nasty bug without me knowing that instead of iterating over X and Y in an image, was iterating over X twice, something like that.
Cassie Breviu: Okay.
Roland Meertens: It's things which you don't necessarily spot if you are too quick to accept it. I think one talk we had as QCon Plus last year was a talk by someone who said reading code is a more important skill than writing codes. I think with Copilot, we're really moving towards more reading codes. The skill to understand code will be way more important now than the skill of writing codes.
Cassie Breviu: That's really interesting. I like that quote. It makes sense too. I feel like that is very true.
Roland Meertens: You talked before about replacing work. Do you think Copilot will completely replace people or-
Cassie Breviu: No.
Roland Meertens: ... is this an extra tool?
Cassie Breviu: It's an extra tool in my opinion. At least with the way it is now, there's no way it could replace people, so how we talked about how I got into this and data analysts and I used Excel macros. Excel is a productivity tool that people use all the time to make themselves more productive. Why shouldn't programmers have more productivity tools that make them more productive? That's really what I think it is. I don't think it could replace, at least not in the state that it is. I guess we could see, but just like you said, there was a bug. I've had things where it suggests things that aren't what I really want to do. So you still have to understand what's happening. Until it would be solid enough to do that, it couldn't replace people and I don't really think it ever will.
Roland Meertens: I think for what I noticed right now is that it replaces all the boilerplate code writing. For example, if you have to split a list up into chunks or something, that's something you would normally find on Stack Overflow. I think mostly, the player who has to be most afraid is Stack Overflow. It's the perfect replacement of asking questions.
Cassie Breviu: That's true. Yeah, because it replaces some of that searching and finding the answer, for sure, or the syntax. Sometimes, I just like, "Oh, I forgot the syntax for this." You know exactly what you want, but I switch languages a lot. I switch around because with ONNX Runtime, it works with so many different languages, and so a lot of times when I'm building different tutorials or different things, I'm switching between different languages. When you start doing that, sometimes the syntax, you are like, which one is this again? You have to remember exactly.
Roland Meertens: What is that problem here? You can just ask Copilot. I think at some point, I tried Rust without ever reading a single line of Rust.
Cassie Breviu: Okay.
Roland Meertens: By just suggesting Copilot, I want this, I want that, and then when you had a compiler error, you had to still intelligently be like, "Ah, it might be an import" or "I want to use this function and then Copilot could suggest how to solve it." So you can actually write in languages you don't speak.
Cassie Breviu: I think anybody that has used lots of languages, you realize is that once you understand one, it's really easy to pick up new ones after that because there's so many consistencies between them. It's small syntactical differences. That's my opinion.
Roland Meertens: What I also think is really nice is that it accelerates your learning. That's the other thing which I was a bit surprised by that, at least as a programmer and frequently have normal ways of doing things. If you describe the thing, then Copilot might suggest a different way, so you easily learn new functions to do things more efficiently.
Cassie Breviu:That was one thing when I was in C#, if you ever use ReSharper by JetBrains, you would write something and it would suggest different ways to do it. This has been around for a while, but it was one thing that I liked because I would learn different ways to implement something. It's very different than that product, but what you're talking about, it's that same thing where you can have a suggestion. It's almost like an AI mentor. Should we call it an AI mentor?
Roland Meertens: Yes, but it feels like someone is sitting next to you and it's very quick at saying, "Oh, we can also do it like this. Oh, you can also do it like this." You describe a problem and they say, "Oh, maybe you want to do it like this." It's not the perfect mentor, but-
Cassie Breviu: Pair programmer mentor, yeah, exactly.
Roland Meertens: I think Copilot is a really good name.
Cassie Breviu: Totally.
Roland Meertens: Do you see anything which is still lacking? Do you have anything where you think, "Oh, if Copilot would also learn this, then my life would be complete"?
Cassie Breviu:There's some times where it will suggest things that look correct, but are not. Then the other thing is I think some languages are more supported now than others. I don't know exactly what is what, but I don't even know if it's technically GA. I think they're still testing it. I don't actually know what the status is.
Roland Meertens: I don't think you can easily get in actually.
Cassie Breviu: Yes.
Roland Meertens: You have to get an invite.
Cassie Breviu: I think there's still probably a lot of perfecting they could do on it. It's super powerful and it really is amazing what you can do. I think it's just going to continue to evolve. I think that those things can always be improved upon. I don't know. I don't work on that product, so I don't really know what the roadmap is, but I can see it evolving.
ONNX and ONNX Runtime [23:58]
Roland Meertens: I think if are people listening, check it out. Download it. Try it. It's really fantastic. So you were mentioning before that you are working on ONNX Runtime, which I think is used to deploying with different languages, different devices. Can you maybe talk a bit more about ONNX and ONNX Runtime?
Cassie Breviu: ONNX Runtime's a really cool product. As machine learning and AI becomes more integrated into all types of software, it becomes a lot more important to be able to pull it out of the Python and the Python native languages that are working for data science and into other languages, but also being able to deploy on multiple devices without having to do all of the configurations, being able to use the different execution providers like CUDA and DirectML and being able to optimize for those particular hardware targets. So there's a lot of different things that it can do and that it does. It solves problems that are really hard that you don't have to now manually solve. One of those is imprinting in other languages. We talked a little bit earlier about being in the C# shop and wanting to deploy with C#, but I was building in Python.
Cassie Breviu: I export that model to the ONNX format, which is the open neural network exchange, and then I use the ONNX Runtime package to then run it in C#. What it does then is it gives me the ability to deploy on whatever hardware platform I want. I can deploy on Linux. I can deploy on Windows. I can deploy on Android, iOS. I can use Xamarin. Then when it comes to the language I want, I can use C++, C#, Java, JavaScript, Objective-C. It supports so many of the languages that really gives you the flexibility that you need to make the models really useful. Then from there, there's another level of, how do I optimize using the most resources that I can to make my model perform the way I need to? It also supports a lot of different execution providers. In order to do that, you can then optimize and use GPU on these different devices, which gives you a better product.
Cassie Breviu: That's ultimately what it really is doing, is solving these hard deployment problems and performance issues too. It's very good. It's optimized and it'll use more of your GPU. Sometimes when you're actually inferencing models, you're not actually utilizing your full GPU. So the idea is to utilize more GPU and improve the performance there as well. Also, there's different graph optimizations and quantization tools to reduce the size of your model while keeping accuracy, so a lot of these very common and hard problems when it comes to actually operationalizing transformer models is where the ONNX Runtime sits.
Roland Meertens: So the ONNX Runtime also does the quantization, for example, and I assume that also, if you have a device without a GPU, you can also move to Runtime like that?
Cassie Breviu: Right. It does support CPU and GPU. You'd be surprised that smaller models can work on CPU. So the one that I actually do in my talk is using Wasm, WebAssembly, which doesn't have access to your GPU. It is using your CPU and it gets really good performance because we're able to optimize that transformer model small enough to be able to deploy it. When you start thinking about things like that, you're able to do a lot more. You're not so confined by big models and big architecture and expensive solutions. It starts making things a lot more useful.
Roland Meertens: Nice. Yeah, that sounds really good. I think you're also teaching people to do this, right?
Cassie Breviu: I run the ONNX Runtime YouTube channel. Should I do the, like and subscribe?
Roland Meertens: I think at least people can check it out-
Cassie Breviu: Yeah, so check it out.
Roland Meertens: ... and leave a comment.
Cassie Breviu: Leave a comment. What I wanted to really do is start creating content around some of these problems that people are having and some of the different ways that you can use it. I actually started it in January and I post about two videos a month, I would say. I think about different problems that people are having. I think about how we can solve those with your deployment problems. I give sample code. I usually write a tutorial around it and it's really just about enabling all types of developers. So the other thing that I think is cool is because I have the traditional developer background that moved into AI, I approached things from the traditional developer perspective and then applied AI and then I go to the data science. I try to incorporate all of those different personas and make it useful for everybody.
Then also, I pay attention to what the community needs. So if people comment and say, "I really am having a problem with X, how do I do it?" that's an indicator to me that that's probably what my next video should be about. So definitely, if you're needing something, let me know. I want to hear what people's problems are with the product. I don't want to just hear it. If you're having an issue, tell me what it is. If you're just trying to do something and you don't know how to do it and you want a video on it, do that as well. So it's a really great way to connect with the ONNX Runtime team, but then also make machine learning and applied AI, I think, accessible to a lot of different types of people.
Roland Meertens: Which videos do you have so far? What do you already get help with?
Cassie Breviu: My most recent video was around converting ONNX models. When you build a machine learning model, you might use PyTorch or Tensorflow or scikit-learn or Keras. We have these different tools to convert it to an ONNX model. You have to get to that point first before you can leverage the features of ONNX Runtime. I did a video on how to export your model from many different machine learning training frameworks. Then a lot of the other videos on there right now are how to inference in different languages. It's showing the pre-processing. One of the difficult parts when you try to do inferencing in different languages is you have to recreate all your data pipeline pre-processing code into the language that you're using, and some of these languages aren't really used a lot for data science.
Roland Meertens: Yeah, and also for computer vision or something, it's really hard to edit an image in Java or something.
Cassie Breviu: Exactly. A lot of them actually are computer vision right now around how do I use... I like using the ResNet model because it's just a common CNN computer vision model that is useful, and showing how to do the pre-processing code in different languages, showing how to implement that then in the language. So I have C#, JavaScript. I have C++. Then I also have Xamarin on there as well. So how to actually deploy a model to a mobile device and how to pre-process that. What you find when you start looking at how do I pre-process these images in different languages is there's different image libraries that you can use in different languages that allow you to get your image, break it out to RGB, create the tensor and process it and reshape it and everything like that. That's been really helpful. Also, another thing that is really useful is OpenCV, which has a lot of different language supports. OpenCV is the common one that many people use in Python, but you can also use it in, I think, C#, C++. I'm not sure if it's in JavaScript.
Roland Meertens: Yeah, especially C++, works really well with OpenCV.
Cassie Breviu: That one is definitely what I use for the C++. For JavaScript, I used something called Jimp.
Roland Meertens: Jimp.
Cassie Breviu: It's a very odd name.
Neural Network Inference Plugin in the Unreal Engine [30:19]
Roland Meertens: What's the next thing you'll be releasing, excited about?
Cassie Breviu: I am really excited about this thing that we actually partnered with Epic Games to create in the Unreal Engine. They created a new plugin called the Neural Network Inference plugin and it leverages ONNX Runtime to allow you to run machine learning models in the Unreal Engine and in game. I think it's a neat and budding area right now. Machine learning isn't really used a lot in games because a lot of times, your inferencing isn't consistent enough to go along with the gaming experience, but there's a lot of cool experiments going on with it and trying to figure out, how do we make it more useful? How can we optimize it? I'm really excited about that. I've been working on a project that uses a style transfer model that will take the game scene and actually apply completely different transform to the way it looks.
Roland Meertens: Do you have an example from what to what does this work?
Cassie Breviu: Yes. There's a few different styles and they're actually the models are on the ONNX Model Zoo. The models are open source and there will be code that comes out. It's not out yet. That's coming and I will release the source code in a blog about how to actually do all of it. The different styles, there's a mosaic, so it's really cool. It's just the base Unreal scenes. If you ever used Unreal, you can go and create a base scene as your starting scene. I wasn't too concerned about making a really cool or complex scene, I just wanted to see how I could use style transfers to make my scene look cool, basically. So very experimental. There's one that's called the mosaic, which makes it look like a mosaic. There's one called rain princess, which I love.
Roland Meertens: Okay. Rain princess.
Cassie Breviu: I did not name it. I wish I could take credit for that name, but it just makes it almost look like a water color painting almost. It's very rainy and very bright colors. There's some other ones too. I'm forgetting the names of them, but it's just really neat to see that. Then I hope that people that are game developers look at it and start to think about, how could I actually use this in my game? How do I make this useful for what I'm doing? That's doing inferencing real time in the game, every tick. It's really cool.
Roland Meertens: I can really imagine that if you're a game developer and you don't know yet what sort of art you want your game to be, you can just make a plain game and then pick an art style later and quickly experiment with different things, give it different feelings.
Cassie Breviu: Exactly. I know there's some other models that you can find examples for where you can actually give it a sample image of what you want it to look like. So you would take your scene and then you'd say, "But I want it to look like this famous painting like Starry Night or something and put it together." Then the model will actually put the two images together and create a new image. So the one that I'm using is already a statically-defined style, but there are ways where you could use different models to do things like that. Another thing that I think would be interesting is I want to see people play with these transformer models, the optimized ones in game development. I haven't really seen it done. I don't know if anyone's doing it, but I think it'd be really cool.
Roland Meertens: To do what kind of tasks?
Cassie Breviu: Well, transformers are natural language processing. You know how you were talking about GitHub Copilot as a pair programmer? What if you could use transformer models for your story development or something?
Roland Meertens: I think there's actually a website. I forgot the name, but we will look it up.
Cassie Breviu: Does it exist already?
Roland Meertens: There's something where you have a role-playing game. So you can type to the computer what you do. For example, it says, "Oh, you're in a room. What do you want to do?" You can type things like, "Oh, take shorts or take lemon or I make it rain lemons," whatever you want and the model tries to interpret it and takes it into the storytelling.
Cassie Breviu: Oh, that's cool.
Roland Meertens: So you have a really choose your own adventure, but then unlimited creativity.
Cassie Breviu: Wow. I didn't know that existed. I want to know what it's called now.
Roland Meertens: I don't know, but we are going to figure it out after the podcast and at the end of the podcast, say to people at the end.
Cassie Breviu: Okay. Awesome. Now I'll look forward to this podcast to go see what that site is, because it sounds really good. But see, those are the types of things where I feel people can get so inventive. It changes how you can do things and what you can do with things. I just think it'll be really interesting as these models get better and as we figure out how to make them more consistent, how it might change all types of development.
Roland Meertens: Do you know the game called Scribblenauts?
Cassie Breviu: I don't.
Roland Meertens: Okay. There was this game. This was already released 10 years ago. There you were placed in situations that you could type words and that object would spawn in the game.
Cassie Breviu: Okay.
Roland Meertens: You could, for example, I don't know, have a person to wanted to, I don't know, do something with it or you would have to cross a broken bridge. You could either type "jet pack" or you could type a balloon and then tie a lot of balloons to you. You could basically use every possible words in the game. So that gave you a lot of creativity, but I can imagine that if you can describe the whole scene, then it gives you even more creativity.
Cassie Breviu: That's so cool.
Roland Meertens: It's really cool. Yeah, it was a fun game to play around with and try what kind of words were available because they literally had every possible word.
Cassie Breviu: I've seen some cool things around transformer models now where they're starting to use them for computer vision tasks or where you can describe things and it'll create the image. I feel like that's what you're talking about, is this... I can't think of what it's called right now.
Roland Meertens: There's somthing from OpenAI. It's based on clip. We're having so many cool things which we don't know the names for, which I have to figure out later.
Cassie Breviu: We're just giving people homework. This is your homework. We're going to tell you about cool things. Now your puzzle is to go figure out what it's called and how to use it.
Roland Meertens: All right. Thanks, Cassie, for joining the podcast. I had a lot of fun and I hope you enjoy rest of the conference.
Cassie Breviu: Thank you.
Roland Meertens: All right. That was my interview with Cassie. Thanks again, Cassie, for participating. As a side note, the AI role-playing game we talked about is called AI Dungeon. If you want to try GitHub Copilot, there is currently still a waiting list, so make sure that you sign up fast if you want to try it for yourself. I really hope you enjoyed this in-person interview recorded at QCon London. Thank you very much for listening to the InfoQ podcast.