Transcript
Morling: I'm going to tell you how I built a search for my weblog. Why is this interesting? I think it's interesting, because this cost me 4 cents for 10K requests. Really, I don't even have too many requests. Essentially, it cost me nothing. This is possible because I chose a serverless implementation approach for this. This is running on AWS Lambda, using their serverless technology. Actually, you can take a look at it right now. You can go to my weblog at morling.dev, and there, you have the search control in the upper right-hand side. If you put some search text in there, it will search my website, and really, this talk is about how this is implemented using serverless technology.
There should be three key takeaways in this talk for you. First of all, I am going to discuss why I think serverless is a good solution for this particular problem. Secondly, I'm going to talk about how you can implement serverless applications using Quarkus, which is a development stack for building cloud native applications. Lastly, I'm going to touch on a practical matter, this is cost control. How can we prevent a denial-of-wallet attack?
Mythbusters - Serverless Edition
Also, I hope I can debunk some common misconceptions and myths in the course of this talk. A while ago, I did this little tweetstorm, which I called Mythbusters, serverless edition. I was trying to address a few common misconceptions I hear again and again. Things like you couldn't use Java for implementing serverless just because of the longer time it might take to start up, or things like, there's automatically always something like a vendor lock-in if you go serverless. I hope I can convince you that those things, actually, are not true, and how they are not true.
Background
I work as a software engineer at Red Hat, where I am focusing on the Debezium project. Debezium is a tool, an open source platform for change data capture. If you're curious what this is, and what you can do with it, I would very much recommend you check out the talk I did at QCon San Francisco, last year, which you can watch on infoq.com. I'm involved with a few other projects, such as Quarkus. This is actually the technical foundation of the search feature.
What Is Serverless?
To bring everybody on the same page, let's just briefly discuss the definition of serverless. What is this actually, again? You can read it yourself. This is a definition by the Cloud Native Computing Foundation, CNCF. Essentially, they say, it's a computing model which describes how you build and run applications without having to think about provisioning or managing servers yourself. Somebody else, in my case, this would be AWS, is taking care of this. Also, there's this notion that this is executed, scaled, and built on-demand. This means if there is no workload, no requests are coming in, everything is scaled to zero, and you don't have to pay anything for this at all. Then if there's a spike in load, tons of requests are coming in at the same time, it will automatically scale out. It will be able to serve this load. All this happens automatically for you without you having to think about how this actually works too much.
Why Serverless For My Use Case?
Why did I think this is a good use case for my weblog? This is a static site, which runs on GitHub Pages. It's just a set of static HTML files. This means those dynamic functionalities like search, you can't really do this there. You need to have something more dynamic. I didn't feel like using an external service like the custom Google search, because I wanted to have control over how this is printed and how results are shown, and so on. I wanted to do it myself. At the same time, I didn't feel like using a traditional application server stack and things like that, because it would mean, I need to install an operating system, something like an app server. I need to keep this up to date, install all the patches, security fixes, and so on. I just felt I didn't want to have to do this. I felt I would like to have some hands-off experience. Really, going serverless for me means there's a smaller attack surface. I just need to focus on my actual implementation of my app. This is the code I'm dealing with, and this is what I need to keep up to date. Then, all the environment, all the server infrastructure, somebody else is taking care of that. It's more secure for me.
Then there's this matter of a lower cost factor. I just have a very low volume of load. This means in serverless, I will get away with just a few cents, or maybe nothing at all to pay. That's pretty cool. Then, don't be fooled, it doesn't mean that serverless always would be a cheaper solution. If you have like a high constant base level of load, you might be better off looking into alternative approaches, but really, for this particular use case, serverless is very cost effective. Finally, it was, for me, a learning experience. I really wanted to learn how I can implement something running a Lambda. Using Quarkus, how could I actually go about this and solve this particular problem I had for my website?
Solution Overview - System Architecture
Let's take a look into how this actually looks like. This is the overview. The static pages are served from GitHub Pages. I'm using Hugo, which is a static site generator. This produces HTML, which then is served while on GitHub Pages. If a search request comes in, so if you go to this little text control, you put some query there, this actually will invoke the API gateway, which runs on AWS. This is done via some AJAX request. It goes to the API gateway, which then routes the request back to the actual serverless implementation, which is running on AWS Lambda. In terms of implementation, this is using Apache Lucene as the search query or as the search engine implementation. This is using Quarkus as the development stack. Finally, this all runs with GraalVM native binary, so this means there's no JVM involved. It runs as native code. This is because it gets very fast to start up that way, much faster than I could do it on the JVM. In terms of a serverless cold start user experience, this is exactly what I'm after.
Indexing and Searching
The question, of course, is how do I actually index my data in my blog post? Hugo provides me some help there. It can generate a JSON file, which contains essentially just the text of my blog post. You can see it here on the right-hand side. This is a nicely processable data structure with all the contents, the tags, publication date, all those kinds of attributes. This is the basis for my search service. I do have one trick, actually, I index everything at build time. If I do a change to my weblog like a new post, I also need to update this search service. Really, this makes it a bit more secure for me. My actual application runtime, it's immutable. It is just read only. This means there's no way you could write it, which is just more secure. Also, for my use case, I don't update this content too often, this is just fine. Then, if I actually do it also to rebuild this application and that way benefit from an immutable application image.
In terms of the search engine itself, this is using Apache Lucene. Really, this is the go-to standard search engine, search library in Java. That's what you're looking at. It's driving things like Elasticsearch, or also Solr. You also can use it as a library, which is what I'm doing here, to test all the things like word stemming, so you can look for certain words in different word forms, plural, singular, different verb forms, and so on. There's result highlighting. You can see this on the website, the results are nicely highlighted. You actually see some context with your query match, and things like that. Also, there's other advanced search features like fuzzy search, or proximity search. Also, really, this is why I didn't feel like implementing this using JavaScript on the client side, because such a mature library such as Lucene provides all this for me, ready to use.
Serverless Apps with Quarkus - Lambda Request Handler
Let's spend some time and think about how we actually can implement such a search or such a serverless application using the Quarkus stack. There's different ways and I will explain just briefly how you could do this, and then also why or when you would choose which one of those three approaches. The first one would be, you just could use the Lambda Request Handler to implement the Request Handler interface, which you already might know if you have been working with Lambda on Java before. You can just use the Lambda SDK, implement this. Then Quarkus makes this for your usable on the JVM, but also in a native binary via GraalVM. That would be the first option.
Funqy
The second option would be what's called Funqy. This is actually very interesting. It's a dedicated function as a service framework, which is part of the Quarkus umbrella. Really, the idea there is to provide you with a very easy to use programming experience, which then targets different function as a service environments. You could use this with AWS Lambda, but you also could use it with Google Cloud Functions, Azure functions, and things like that. The idea there is you just define a single method here, which takes the parameters of the function as a method parameter to input. Then it just returns something. It's a plain Java method. Funqy takes care of mapping this to Lambda and all those kinds of execution environments. That's Funqy.
Lambda with Vert.x Web, Servlet, or RESTEasy
The last option you have for implementing a serverless application with Quarkus would be just to use your traditional web stack, so you can use things like Vert.x, maybe just in Servlets or RESTEasy, JAX-RS, things like that. Actually, that's what I went for here. There, if you ever have done some REST development using Java, this will look very familiar to you. Here I'm using JAX-RS. I use all those annotations like @GET, @Produces, and so on. Really, this is my classic way how I would implement this. Then Quarkus actually takes care of mapping the Lambda invocation in the right way, so this web endpoint for me gets invoked.
When to Use What?
When would I use which one of those three approaches? I would go for the Request Handler approach, really, either if I had existing code, maybe I already have an existing application, and now I would like to move this over to Quarkus or maybe a native binary, then I would use this. Or, if I really needed to use specific APIs from the Lambda environment, then I would use this. Otherwise, I would not really go for that, because it really means I am now limited, I'm locked into this environment, and moving somewhere else would be rather tricky. I would use Funqy if I would get away with it. It really is a very basic function-based approach. This means it just maps simple GET and POST requests for me. I couldn't, for instance, do something like an HTTP delete. Really, if this is enough for me, then I would go for this. Back when I built this, Funqy was just not as far as it is by now, maybe I would have gone for it. Actually, back then, I went for the web stack because this gives me the full flexibility. Maybe I use things like HTTP headers, cache control, all those attributes, content negotiation. I can do this using a web stack like JAX-RS. This is why I went for that. I felt I just would like to have this flexibility. Also, this makes it very portable.
Building for Lambda vs. K8s/Knative
Actually, what I can do now is Quarkus builds a Lambda function .zip file for me, which gets deployed to the Lambda environment. Really, I can just easily build this for other environments, just by matter of configuration. Let's say I would like to run this on Kubernetes, or Knative, which is the container based serverless approach. I just can use this because I already have a portable web application. What I did in my project release, I put this dependency, the quarkus-amazon-lambda-http dependency, I put this into a separate Maven profile. Now either I enable this profile and then I build my project, my search for being executed via AWS Lambda, or I don't build it with this profile. I just disable the profile. Then I just build a regular JAR, an executable JAR. I can very easily put this into a container image using tools like GIMP maybe. Then I could put it to Kubernetes, Knative, and all those services, which would interact with plain Linux container images. I have no login at all, I'm just very portable here.
Cold Starts - A Dish Best Served Fast
Let's talk about cold starts, because this is the biggest concern around serverless. This means if a request comes in, it could be this application has been scaled to zero before. In my case, this would be very likely because I don't have a high load. This means in all likelihood, if you go there, it won't be running and it must be starting up very quickly so that the user doesn't have to wait there for a long time. Quarkus helps a lot with the startup times, because what it does is it has this notion of a compile time boot. It does many things, which your traditional stack would do at application startup time, like scanning the classpath for dependencies or entities, things like that. Quarkus does all those things already at build time. Then, essentially, bytecode gets recorded, which describes this information, and it gets baked into the application. It just makes it faster to start up on the JVM. Really, if you want to go to the next level and have a startup in terms of milliseconds, then you would look at native binaries via GraalVM. Quarkus makes this very easy for you to do this.
Then, you're looking at startup times in the range of milliseconds. Let's say on my local laptop, this search would give me a time to first response. The time between startup and first result of below 50 milliseconds. That's definitely very acceptable. In the Lambda environment, it depends a bit on how much resources you have assigned. You would look maybe at 200 or 300 milliseconds. Still, it's acceptable. It gives you an instantaneous feel so that the user doesn't have to feel that they need to wait for the cold start here.
Native Binaries via GraalVM
There's a challenge with native binaries and GraalVM. You cannot just use all the tools and tricks you did in your Java application in a native binary. There are things like reflection, method handles, calling code via the Java native interface. You cannot really do this just as easily, because GraalVM does lots of optimizations. It does things like dead code elimination, so it analyzes your application. Then if you don't use a specific method, or constructor, this would not be part of the native binary and then it couldn't be used. That's why reflection just cannot be used out of the box. Instead, you need to announce, you need to configure which archetypes you are going to reflect on. This is why if you take such a library as Apache Lucene, you just cannot take it and it would run out of the box in the native binary. In all likelihood, you need to do some preparation or steps for that. Quarkus comes with a set of extensions there. The idea is, for a very large range of popular dependencies, Quarkus enables their usage in GraalVM native binaries, so things like Kafka clients, Hibernate, Vert.x, Camel, MicroProfile, Prometheus, and so on. There are Quarkus extensions for that. Then this means you just add the right extension to your project. Then all this configuration will be made for you, and you can use this in a native binary.
No Quarkus Extension for Lucene Yet
In my case, for Apache Lucene, there was no Quarkus extension for this yet. This meant I was a bit on my own. Really, it's not too hard. Essentially, it's a bit of a matter of trial and error. You give it a try, you see how far you go into native binary, and then you can react to that. Just to give you one example, in Lucene, there's what's called the attribute factory, which is a factory which instantiates those attribute classes, which are essentially like metadata attributes of an index. This attribute factory uses method handle to instantiate those types. Actually, in this case, it cannot be used. There's different ways method handles can be used in this particular one. It is not yet supported by GraalVM native binaries. What did I do? I created what's called a substitution. GraalVM and the native binary creation allows you to substitute specific classes. In that case, I just provided my own version of that attribute factory, where I just use plain method calls to instantiate those attribute types. This class is used instead of the original one, and this problem is solved for me. Then I have this extension. This still is quite specific to my use case. Ideally, I would make this a bit more generic and then others who also would like to use Lucene would be able to use this as part of the Quarkus extending ecosystem.
Leveraging Interaction Patterns
There's more we can do. We can go to a native binary and optimize the cold start times. Also, we can improve the situation further just by observing how this actually is going to be used. In this case, I figured, if a user is putting the mouse pointer into this text input field, in all likelihood, just shortly thereafter, they are going to execute a search. This is what I'm leveraging. If you look closely here at the network tool in my browser, what I actually do is, as soon as you put the cursor to this text input, it already will send a request to the back-end. This will actually take care of starting up the environment if it wasn't started before. Now this means if a second later, or two seconds later, you actually submit your query, you will already hit a started up environment, and you don't even perceive the cold startup delay at all. This is one way we can further improve this.
Cost Control - How to Prevent Denial-of-Wallet Attacks
Lastly, I would like to talk about this matter of cost control. The problem there is, this is a private project for me. It's a side effort. I don't have an operations team which would be running and monitoring this for me 24/7, which means I really want to be sure there isn't such a thing as a denial-of-wallet attack. This means somebody just comes to the search, invokes it again and again. As with the serverless, which means I pay per use, potentially, I would be in for a bad surprise, and I would have to pay for many requests, which I don't really want, of course. The question is, how can I avoid this? There's a few measures you can do to make this really safe. The first thing to do is put an API usage plan in place. You can see it here a bit on the screenshot, you can do things like saying, "I would like to just have 50K requests per month. If it goes beyond that, no more requests are accepted." That way, no more request will be billed. There's a problem with that. This is what's called the CORS preflight request. If you have an AJAX based integration into a website, as I have it, this means the browser will send this preflight request, and there you cannot have any custom headers. This means I cannot parse the API key, which in turn means I cannot use this API usage plan. Those CORS preflight requests, they still are unlimited. I cannot really limit them.
Budget Circuit Breaker
This is why I put in place what I would call a budget circuit breaker. Essentially, the idea there is to just have another Lambda, and this one essentially reacts to a budget alert. I have a budget alert set up which triggers if I reach a specific amount of spending on AWS. If this alert triggers, then this Lambda gets invoked. What it does is it spins down the rate limit and the burst limit of the API. This means zero requests per second are allowed, once this actually has kicked in, which means I wouldn't be charged for any more requests coming in. There's just one more problem with that, and this is, this all is asynchronous. This could mean it could take a few hours even until this circuit breaker really kicks in, and still during that time, somebody could still be invoking those CORS preflight requests lots of times. That's why I put throttling in place, so just to rate limit the number of requests. In my case, this is a private page. I don't need more than maybe 100, or maybe just 50 concurrent requests, and this would be plenty. I can just further limit down the speed of how quickly this would pile up. Really, to be ultimately safe, in the end, I put all of this behind CloudFlare Service Workers. There, they have a very good free tier, essentially, which allows me to handle all the CORS requests over there. Then if it goes beyond their free limit, it will just be cut off. In my case, I'm just fine with that. This is not business critical for me. I'm just fine to shut it down after 100K requests per day, and then it would cost me nothing at all. That way, I'm really safe. If I sit on a plane or I'm on holiday, I would not be at risk that this gets out of control.
Lessons learned
I definitely sense there was quite a steep learning curve with Lambda. There are things like IAM, Identity and Access Management, I found this is a really tricky beast. It's hard to understand the minimum set of permissions you need. It takes you quite some time to figure this out. Actually, there's a correlation between how much RAM you have assigned and then how much CPU cycles you would get. Actually, I would get away with 128 megabytes of RAM for my search there. Really, I have assigned 512, because that way I get more CPU shares, and just the end to end latency is a bit lower, requests are served quicker. That's another interesting thing to keep in mind. More RAM doesn't necessarily mean it's better. Might not be worth it.
Then, really, with all the learning, I found Quarkus helps me a lot. As we have seen, it avoids this lock-in, so I'm not bound to Lambda specific interface. I can implement my serverless feature in different ways just using a regular web stack such as JAX-RS. Then there's other things that, for instance, creates templates for SAM. That's this Serverless Application Model, which is another tool from the Lambda ecosystem, which makes it quite easy for me to deploy such a serverless application. I can generate this actually. I can go to code.quarkus.io, and I just add write dependencies to this, or the Lambda dependency, and then this will generate a scaffolding for a new project for me.
This already contains templates for the SAM tool for me. I'm just two commands away from deploying this to Lambda and have it running. Finally, really, for me the most important takeaway, because this is a personal side project, you definitely should think about how you prevent those denial-of-wallet attacks.
If you come back to those myths and misconceptions, I hope I could convince you that you definitely can use Java via GraalVM native binaries. Cold starts are not a problem, really. Also, it doesn't mean lock-in. There's a very good portability story here. You just could move this functionality to other clouds or just even to Kubernetes or Knative, and you wouldn't have any login at all. Those myths, I hope I could bust them.
Resources
This talk, essentially, is a condensed version of a blog post, which you can find on my blog. There, you can read up in all detail about all those things which I've been talking about. Also, all this is open source, so you can just take this and put it to your own blog, to your own site, if you want to. Feel free to do it. It is Apache licensed. Then if you would like to know more about Quarkus, which is driving all this, go to quarkus.io. Lastly, I have this blog post I found by this engineer, Harish, which was very helpful for me to implement the budget circuit breaker. Definitely check this out, too.
See more presentations with transcripts