InfoQ Homepage Presentations Beyond Speed Limits: Exploring the Performance Power of Valkey

Beyond Speed Limits: Exploring the Performance Power of Valkey

View Presentation

Speed:

49:39

Summary

Senior Solution Architect Viktor Vedmich shares how engineering leaders can maximize application performance using Valkey. He discusses the open-source Redis fork's 100% API compatibility, explores advanced caching strategies like lazy loading, and explains how to implement powerful data structures for real-time analytics, rate limiting, and session stores to solve the thundering herd problem.

Bio

Viktor Vedmich is technologist who leverages 10+ years of architecting systems for millions to demystify generative AI as an AWS Senior Solutions Architect and host of the technical podcast.

About the conference

InfoQ Dev Summit Munich software development conference focuses on the critical software challenges senior dev teams face today. Gain valuable real-world technical insights from 20+ senior software developers, connect with speakers and peers, and enjoy social events.

Transcript

Viktor Vedmich: When we're talking about our applications, sometimes especially if we require at very low latency, we have to think about some cache layer, especially memory cache layer, because if we compare how fast we can read from hard disk or even from NVMe, doesn't matter, it will be 20 times faster memory than our disk. My name is Viktor. I'm a Senior Solution Architect.

Roadmap

Today, we will talk about Valkey. You're in the right place if you're thinking about improving performance, if you think that the latency that you currently have in your application is not so good and you look around for some options, variations, and you want to go a little bit deep in data structures, API, what's provided by Valkey, and in a sense, what exactly it is, and what's the use cases you can apply in your application, in your project. We'll start with introductions. The most interesting part will be relating to the use cases of Valkey and caching at your application. At the end, I will share best practices, how and what you can do to manage and work with Valkey and your production environment.

Introduction to ElastiCache and Valkey

In the next slide, you will not see Valkey everywhere, you most probably will see ElastiCache. The reason for that, because inside the database, you can run Valkey in ElastiCache. ElastiCache, it's a managed service, provides for you very good availability, easy to start and scale as much as you need it. That's where you can run Valkey as an engine for in-memory cache. If you look at the different use cases, why we need in general in-memory cache, we will see so many different use cases, starting from super simple like caching, and maybe a leaderboard application or maybe machine learning or whatever. From the first glance, it looks like there are no common use cases, what's common is under the hood, that's data structure, what is provided by Valkey. It's API, what's provided by Valkey. That's how easy you can apply all of these use cases.

What exactly is Valkey? Valkey is an open-source product, what was created as a fork in 2024. In general, it was not like a replacement, but it was created on top of Redis, because let's say Redis decided to move on a little bit, another way, outside of open source and make a decision back, and so on. I know it's a little bit of a political situation, but nevertheless. Valkey is based on Redis. It means that if you would love to change from Redis to Valkey, it's 100% compatible in any API solutions and so on. Valkey is under the umbrella of the Linux Foundation, CNCF. What this means for us, for developers, it means that Valkey will be open source for a long time. It's a guarantee that it will be open source.

AWS is a big contributor, but we are not alone. As you can see here, 40 organizations, 150 different contributors, and so on. The biggest part of what AWS already made as a contribution was we improved the performance. We're able to reach 1 million requests per second with thread using Valkey. We improved the performance more than two times. Less consumption of memory. I will show you what kind of technique you also can apply to, for example, use less memory in some different use cases.

Use Cases Exploration

Now let's go deeply to our applications and what exactly we can do as a developer. I prefer to talk about some technology and some use cases on some applications. Let's imagine that you were hired by a very promising startup, still a startup. What's the startup doing? We are going to sell some images, like Night Sky Marketplace, which is created by a human, not generated by GenAI. We expect that it will be very successful and we will grow very fast. It will be like the next unicorn. We decided to start using some cache layer. Our current architecture looks like that. We have EC2 instances and we communicate with our data storage. Data storage is RDS. It was very easy to start this relational database.

As I said, we are growing very fast. We have more and more clients, millions and billions of different clients. What can we do in this situation? We can scale our RDS system vertically to add more CPU, memory. That's a good approach, but still has some limits. Even with that we can add more replicas, just like horizontally scale, but still, we're 100% sure that we will reach the limits of RDS in performance. Because anyway, it will be milliseconds in response, it doesn't matter how many resources you will put to your relational database. RDS here is just like an example. Again, because disk is 20 times slower compared to the memory. Especially when we're talking about relational database, you need to make some very complicated query, which requires you to make a join between different tables. It can take a while to execute this query. It will be better and easier to store the result of the execution, for example, in Valkey as a cache layer.

Very fast to read from Valkey. This means we also can improve the response time from a latency perspective to achieve sub-millisecond latency, and release the pressure on our relational database. Because in this situation, maybe we even can remove some of the replicas and save some cost, because we will read not from our origin, from our sources, we will read from Valkey.

Caching Strategies

What kind of caching strategies can we apply using Valkey? The first one is lazy loading. It's super easy. What will happen in this situation? We will try to get our item from Valkey. If we succeed, it would be hundreds, for example, microseconds, not milliseconds. If it's not in our cache, not in Valkey, we will go to RDS in our relational database or whatever, what your source of truth is. It would be around 10 milliseconds, depending again on your configuration, depending on your performance databases, and how many resources you have. After that, we will read, store this item, that object, what we got from RDS, in our cache layer. The next time it will be much faster. Another strategy would be write-through in a cache. We will directly write to RDS as soon as we get a new item, and simultaneously also write to ElastiCache. In our case, it will be Valkey, for example. The next time we will get this item from the cache.

Both of these approaches have pros and cons. We can discuss which is better for you. What if I would love to combine both of these strategies? Because some of them are very good in one situation, another one in another situation. What we can do, we will do lazy loading from the cache. First of all, try to read from Valkey. If it's not presented there, we can go to our database, just specifically for this example. Just to be clear, here I use DynamoDB because we required some functionality from the database. You will see why. If it's not presented in Valkey, I will go to DynamoDB and read it from DynamoDB, which is good, and then store inside of the cache. How can I invalidate my cache? Because we could be in this situation, when I change the price, for example, because I'm selling the images, I change the price on some super popular image, I reduce the amount of the price or whatever, and I need to invalidate the cache.

What can I do in this situation? I can combine from the previous strategy, write through the cache. I write to DynamoDB that I have a new price on this image. Using the internal mechanism of DynamoDB, I trigger my Lambda functions, which will asynchronously invalidate my cache inside Valkey. In this situation, as soon as I made the changes in my origin, in my sources, in DynamoDB, I also invalidate it on my cache layer. The next time when people will try to get this information, I can even put the right value or I will go again to DynamoDB to get the right value of my cache. Sometimes we can be in situations, for example, in our application, as soon as we have millions of users and millions of images, maybe some of the images are so popular, we can call them hot data. We would love to avoid the situation that hot data would be like cache missing.

It means like it's not kept in my cache, and I have to again refresh from origin. What I can do in this situation in Valkey, I can run MULTI command. As you can see here, I just say, it will be MULTI command, get the hotkey or your specific key, getting the TTL for this key, time to live, and execute and get this number. In a real example, I will have multiple different clients which are connected to my cache, and getting this number of TTL expiration. As soon as I reach the threshold, like five seconds, I will run this process to repopulate the data from my database, like in this case, DynamoDB, it doesn't matter, to ElastiCache, to my Valkey and the system to store in the cache. The next clients will not reach the origin to update this data.

What if I still have some images that are so popular, like millions or even billions of requests, and I have a distributed system and maybe many clients at the same time trying to reach the same data, the same key, the same value from Valkey, and this item expired? It's a known thundering herd problem. This means that I have the situation that cache misses. All of my clients will redirect communication from cache, from Valkey to my databases, to my RDS or whatever, DynamoDB, it doesn't matter, and trying to get this information from RDS. For sure, I believe you understand what would happen. It will be a bigger pressure on my relational database. Most probably, even with that, I can experience some downtime because there may be some problems with the whole system, because my RDS was not ready for so many connections at the same time. For sure, it will make an impact on the latency inside of my application.

What can we do in that situation with Valkey? First of all, we can apply the logic that one of the clients will lock my item in Valkey, and every other client will notice this item currently is locked. I, as another client, should not go to some sources where it was like origin to database to get this item. All other clients will wait for that. As soon as some of the clients will repopulate the data from database to Valkey, store in the cache, and after that, we will recontinue this communication from our clients to the cache, and get this data. Let's go a little bit deeper here, how it will be applied from the logic perspective on two different clients. A little bit simplified, because it could be multiple clients. Just to visualize, I think two clients is more than enough.

First client tried to get some item from my Valkey and got a nil. It's not presented on my cache system near that time, but a little bit after, maybe a couple milliseconds or whatever. The second client also tried to get this key and also gets a nil. The first client already set a lock on that item. It means currently this client is working to repopulate the data from my database and put this lock on this item and connecting to the database, query the database, waiting for this, for example, 10 milliseconds. Meanwhile, the second client also will try to set this lock, because it's logic. It could be randomly one of the clients have to put this lock. It figures out the lock is already here and started the mechanism to wait for and check periodically that this lock disappeared, like it's unlocked, or it's still locked and I have to wait for it.

As soon as we repopulate the item from origin to the Valkey system, so I have this value for the item, the first client will unlock the item and we will be able to get this value for the second client, also. This is how we can solve this thundering herd problem, especially if you have multiple clients connected to one of the hot items in your Valkey system. Currently, I've talked mostly about a relational database and shown you DynamoDB, which is a NoSQL database, but mostly some data. In reality, Valkey is a binary engine, so you can store in Valkey, S3 objects and also save money, just not reach S3 and getting this bigger object and storing in Valkey. Everything that you can serialize, you can store in Valkey. It's not a big problem for you to do it. As you can see, it's many different services presented here, but in reality, you can do almost whatever you want.

What if I have a very popular startup, which now even has more pressure on my clients and I understand I need faster than I had previously, and I would love to implement client-side caching. It means that my client will communicate with Valkey, get some value, store inside the local cache inside the client, for example, and no need anymore to try to communicate with Valkey because I'm storing on the local cache as a presence. First of all, here, how can I solve the problem of this is cache not valid anymore or we made a change of the original items and it's already a new value in my cache. What we can do is super simple stuff. First of all, for sure it will be TTL, or we can subscribe, for example, to some prefix and be aware that this is item invalid, and to invalidate the item in my local cache.

How exactly can we store and invalidate some specifics of the item? There exist two different approaches. The first one is that we will store on the server-side, on Valkey, everything that we provided to my client. For example, I have multiple clients, and on the Valkey side, I will store these connections as a client1, got a key1, and so on. This costs a lot of memory from server-side, and it's not an ideal solution, especially if you have a limited amount of memory. This helps us to invalidate the cache for a specific item inside as a user, inside as a client. Or, as I mentioned, you can subscribe to some prefix and say, everything that will change on the user keys, we will broadcast to all clients in our architecture, in our application, and we will invalidate all of the caches related to these items. We send this notification message.

Even if we go deeply a little bit on these implementations, what will we see? First of all, I want to pay attention, you see here connections pool. What does it mean? From TCP perspective, when we make these connections to our Valkey, we have initialized these TCP connections. TCP handshake has cost some time for us and some CPU resources. We highly recommend to have a long-lived pool of connections, what you can use whenever you need it, without creating again this TCP connection from scratch. Long story short, what I want to show you here, is like an approach to working with Valkey, it would be that inside your client, you created, for example, 10 connections to Valkey, and one of them would be responsible for invalidation. As soon as we get some of the items currently stale or it's invalid or whatever reason, all other connections currently used for data, to get the data from Valkey.

As soon as I get some of the items currently invalid, it will be redirected to my connection 0, and we will reset this item on client side, and the client will again go in through existing connections pools and get this item from my cache site. This is how we manage performance, like improve the performance latency in our applications.

Session Store

What about if I want to create personalization on my application, I mean like session store? Session store is a very popular feature, especially if you have e-commerce applications, banking applications, gaming, you would love to remember who your customer is in your session. Like for example, in my case, it would be some users. It would be very amazing if when the user opens the marketplace with Night Sky, they will see different pictures, like here's picture number one, picture number two, because it's different users. How can we do it? For sure, under the load balancer or something, we will have some executors, like in my case, it's EC2 instances, but to store the session, we will store everything in Valkey, because in this situation it helps us to make this distributed system as stateless, we'll not store anything on our executor.

As soon as I need to get information about user number one, what's their preference, maybe cookies, maybe what the user currently has in their shopping cart, and so on, it will be stored separately. It's also very helpful, especially in a very dynamic environment, when you scale out, scale in, add additional instances, removing like in Kubernetes stylish. Everything from that will be stored in Valkey. How exactly? What's the best practice to store the session? In Valkey, it will be a data structure. The next couple of slides will be related to different data structures, what you can apply and do in Valkey. For the session, the best way to do it, it will be a hash data structure, because it will be very easy for you to store what's related to your user, unique data, like session ID or whatever.

As you can see, to put the data in hash data structure, I use the command HSET. In the future for other data structures in Valkey, you will see this different prefix. What I mean prefix here, it's H. H is a prefix. For example, for the next one, it will be Z, and so on. I can store information about my user in a hash data store. The biggest benefit because of the complexity, it will be constant time, so O of 1. It doesn't matter how many users you have, it will be super-fast, just to read and write this data. If we go a little bit deep, an example of the code in Python, just in case. What I should do, so for example, in Python, I will use GLIDE as a client to make connections to my Valkey. This is how I set up connections to Valkey.

The next one, for sure, I need some dictionary map to put this information, so what I'm going to store in Valkey, and generate unique session ID using uuid as like unique identifier. For what reason? Because this is where I will store this information. Just like store the results of this session, I put it here. When I need to get back this information from hash data store, I will hgetall from my hash data store, and make modifications because I will get the dictionary and have to replicate back what I had in my Python code.

Feature Store

If we go to the next level of personalizations, and be very friendly to our users, another feature would be a feature store. Feature store is a key component of machine learning infrastructure, and this will make predictions based on the user behavior, what kind of pictures you prefer. For example, in our startup, we're based in Germany, maybe I would prefer to have only authors from Germany, and based on learning about my behavior, we will suggest to our users to see only images from authors based in Germany. For sure, Valkey will not be part of your machine learning infrastructure. It will not be involved in training and so on, for sure not. For feature store, this is where you can apply. There exist two different types of feature stores. The first one is offline. Mostly, it will be where you do this machine learning stuff, training, batch scoring, all of this stuff.

At the same time, when your user logs in currently, you need to make this decision very fast, and it will be online feature store. This is where Valkey can help you to do it because it's super-fast, and it will be like a part of core feature store infrastructure. On this architecture, what you can see at the heart of this infrastructure, it's FEAST. FEAST is a place where we will register our feature store offline and online. In my case, for offline feature store, it would be Redshift, where I will do all of this complicated stuff related to creating a new model, batch processing, and so on, like for slower feature store. Valkey will be responsible for providing a decision on what currently a user prefers, and register in the FEAST YAML file, which will be very easy.

Because what we should do, it's just like I have two different feature stores. The first one, it's online. I define, it will be Valkey where it's based, and connection string to that. For offline, it will be Redshift with authentication, with configuration, and so on. For feature store, as we already saw, hash data store would be a better solution from data structure. You can use hash data store inside of Valkey. It will be a little bit incremented, I mean like inherit it to different hash data structure inside Valkey. Again, it will be very fast, random access, constant time. For that reason, Valkey will be the very best solution in these specific use cases.

Real-Time Analytics

We're done with implementations of personalization, with faster response of our applications, and so on. Now, we're on the phase where we would love to understand what's going on, like what kinds of pictures are more popular to make some analytics. I believe many companies are already doing analytics. What about real-time analytics? I really love to see what's currently the most viewed photo I have in my amazing startup. It sounds like a leaderboard problem, like task or session. What will I have? I will have many different images, and for every image is the number of views. This means not user and score, it will be image and the views. This is how I can store my data. For that, first of all, it would be sorted, in a sorted way, because I need to understand what's the picture that has more views than the other.

To implement in Valkey, we need to combine two different data structures. The first one, it will be hash. That's what we already saw. It will be very useful in our case, because we need to store image and the score, for example, the views. The second one, it's a little bit interesting, the skip list. Skip list help us to provide this functionality of sorted list. As soon as we, for example, increment it or put some new image, it will be stored in skip list. Skip list, what's interesting is it's providing bidirectional understanding. We can sort it in two orders, like from lower to higher, and from higher to lower. It's very easy. This is how we can do very quickly to show what's the lowest, what's the highest, using the skip list inside in our applications. Complexity of that will be Log N, which is pretty fast.

How exactly will it work inside Valkey? Currently I use Z, as you can see, Z is a prefix. This Z means I will work with sorted list, under the hood, skip list and hash. ZADD, it's like added a new image with amount of the views. It's how we present in this visualization. We added new image, new image. It's put it on the right sorted way. We see the right order of different images, and it depends on how many views they have. I believe it's not the use case of a real application, I mean what we do in production. Most probably, what we'll need to do is like increase, for example, amount of views. For example, my image number something, 1, has currently 31 views. Using the operator Z in groupby, I increase amount of views for this image.

For sure, I have to switch the place because currently image 1 has higher number of views. In the leaderboard, image number 1, it will be in second place. Also, using a sorted list, what kind of functionality you have in Valkey, you can make a ZRANGE to get information from your leaderboard, from lower to higher. Here, as you can see, I did not put any boundaries. For me, it's like infinite, from highest to lowest. Zero is like stop. For start, that's how I started, like from infinite. Also, you can define to sort on a different way, from lower to higher and higher to lower. This is a functionality of skip list. Result of this command, for example, it will be respond to all my images with the value of views inside of that image.

What if I would love to make a little bit more complicated stuff? Views is amazing, but what about unique views for each photo? Like unique user to view the photo. How can I do it? The first that comes to your mind would be like a set where we will put all of the users that we have, who views this image, and store it. From a memory perspective, it's super linear the amount of memory that we will need. Because if we will have millions of users, for sure, it will consume a lot of memory. What can we do in this situation? This, I think, is one of the most interesting data structures that we have for today. It's Hyperloglog. It's very interesting because it's a probabilistic data structure. It means it will not guarantee for you 100% accuracy.

At the same time, it has very big benefits. It doesn't matter how much information, how many users you have, it will not consume more than 12 kilobytes. Error rate is only 1%. Especially for this use case, like, for example, likes, views, unique, and so on, I think it's a very good solution. For sure, complexity of that O of 1 for read and write, I think it's amazing. If you see how it will look like from implementation, again, you see a different prefix. It's PF for Hyperloglog. I just add views to my image, like three different users. I got a return, like I said, 1. That means all of the users are added to Hyperloglog. Then, for example, I am getting what's the amount of views? It's 3. That's ok. Trying to get again the same user will get us a 0 because this user already exists in Hyperloglog. Again, getting the amount of views, the same number, 3, that's ok.

If we go a little bit to another level, we add 10,000 views, for example. Let's look at different implementations. As I said, like in the set way. We will do in the set like 10k unique users. Memory usage in this case would be around 400 something kilobytes of memory inside of my server. If we go to Hyperloglog with the same amount of users, first of all, I will not get back exactly 10k. As you can see, like a little bit more than 10 views I lost. Maybe in your specific situation in this use case, it's ok-ish for you. For sure, it depends on the implementation. The most interesting part is this is related to memory usage. Memory usage is still 12k, so 12 kilobytes. That's it. It doesn't matter. It will be 10k of users you store in Hyperloglog, or 10 millions of users you store in Hyperloglog.

Rate Limiting

Now we saw it's so popular, there's many people, and so complicated a distributed system we have in our marketplace to sell images that so many different users try to reach and upload and do different stuff that we have to apply rate limits for users. What can we do? In a distributed system, at the backend, for example, we use weather forecast to predict what will be the weather tomorrow or whatever. It's a part of our distributed system. The problem with rate limit in a distributed system, it's related to how we can sync this rate limit between your components in your microservice application or your distributed application. It would be a good idea to store it a little bit separately. There exist two approaches in Valkey on how you can do it. The first one is a simple one, the second one a little bit more complicated.

Let's start from the simple. First of all, you see I'm going to use as a string to store numerical data because it will be amount of views. In Valkey, it's a binary engine. You can do it. It's not a problem. We will use a string data type but still be able to do a numeric operation for that. Complexity of that, again, it will be O of 1, super-fast. Let's do an example. I can increase the number. For example, 1, 2, and so on. This is how I add additional amount like what's allowed for the user to do. Let's go to real implementations, how it looks like from a user perspective. First of all, what should we do? We have to upload to our server-side a Lua script where we will have this logic. What's our rate limit for a user? How many requests, for example, per second do we allow, or in some amount of time?

It will be executed on the server side. Rate limit algorithm, we've started from the 0, like from nothing, and increase the next request using incremental operator to increase the number. Using a TTL, TTL helps us to make it empty our bucket, make it empty this box. Now users have a fresh rate limit. On these visualizations, what will we have? One request, a second request, a third request. Now it's not allowed for the user to make any request. Users have to wait for TTL to end. As soon as it expires, the user is able again to make the new request. This is how rate limit could work inside Valkey. It was a simple way. Let's go a little bit deeper to Lua code, how it looks like. First of all, as you can see, I was a little bit lazy and put it as a hardcoded limit number and time expiration, 10 seconds, which means for any user, we allow 4 requests per 10 seconds.

What do I have to do next? For sure, I have to get the value if it exists. If it does not exist, I have to create it and set TTL, and start this TTL. To check this, if I'm allowed, should I increase the value or not, or it's allowed to execute not this request on the next step. How can I do it inside of Valkey? First of all, there are two different commands. The first one, you need to upload this Lua script, what I've shown you, just a couple lines. This Lua script will be like a string as parameter. Also, when you upload, you have to upload the script to Valkey. As a result, you will get a SHA ID. This is a unique SHA ID, what you will use to invoke this script inside of Valkey. To do it, you have to use the operator EVALSHA. Also, a part of this execution, how you invoke this script, you also can provide some arguments.

An advanced rate limit example, I will show you how you can do it. As a result, I will go to 1. 1 means it's authorized from a Valkey perspective and able to run this script. Another approach is token bucket for rate limit. This is where we define the fixed capacity of our bucket. Also, we define the specific refill or where we define like, add the token, for example, and refill rate in our bucket. We consume some tokens, maybe tokens, if you're familiar with GenAI, and this rate limit and GenAI also, I think are very popular. In general, what do we have? We have a more complicated data structure because currently we need to store information about the user. What's the capacity for this user as a bucket? What's the refill rate, like, for example, one token per second? What's the current consumption of the tokens, like 7, for example, in this case?

To do it again, I will make a reference to store this type of data, we go back to hash data store, where we will store part of the token current consumption, and the timestamp started, like when we started to interact with my capacity bucket and check that we have the right number, from a time perspective. How do we implement it from Lua script? Again, it should be Lua script, what we have to upload to our server. It's a little bit more complicated, split in a couple of screens, but I will explain the logic. First of all, I have to define the variables. If you remember previously, I hardcoded. In this case, I will get it from arguments. What I will send using EVALSHA, my script. Also, I need to get the current time, and I'm getting the already existing hash.

Also using the Valkey internal operator, I'm getting the time. This is the current time. What's the time, currently? Make pretty simple calculations based on what's the amount of the token, what's the time difference between the previous request and the current request. If it does not exist at all, I will assign to the current token whole bucket availability, like 10 tokens, for example. In the next step, I check in that this request is allowed. We'll not reach the rate limit or not, and make these calculations, basis more than 1. In general, we allow it to make this execution. For sure, we have to set in our hash data structure. Finally, you see it's very interesting here. It's related to TTL again. It's not the accuracy of this solution, it's more related to consumption of memory.

Because maybe it could be the situation that you define this bucket of tokens and so on, and user goes to sleep and does not connect to your application during the next eight hours or whatever. For sure, we don't want to store this information for the next eight hours. We prefer to define some TTL and just remove all from our Valkey, and not make an impact on our Valkey system and memory consumption. That's it with data structure approaches, and use cases. Not all of them.

Best Practices and Operational Overview

I would love to cover best practices of operations, what we as AWS recommend to operate with your Valkey engine as a cache. First of all, cache in memory, it's not persistent storage. You should use Valkey for ephemeral data. It's data that can be lost. You should be ready to lose some of your data, and you have to refill it to your cache engine, like in our case in Valkey. It's not related to Valkey, it's a recommendation. It's a general recommendation if you use an in-memory cache solution. What you can do is just tolerate the loss. Also, maybe make more complicated stuff. Detect the loss that you lost some of the items and make the refill automatically, for example, through some Lambda functions or whatever inside of your application logic. Or in general, rebuild or repair part of your cache, because you want to avoid this cache missing situation, as I showed you this thundering herd problem.

If any of that doesn't work for you and you still want to have a very fast solution, maybe you have to look at another option. Valkey, by default, it's not the best for persistent storage. Inside the database, we have Amazon MemoryDB. This provides data consistency. Under the hood, it is transaction logs replicated between availability zones, all of this stuff. Long story short, what I want to say, again, Valkey is not for your persistent storage. You have to be ready to refill your data inside of your cache. Also, again, sometimes it can be like a tradeoff. How much memory? How many resources would we love to provide for our memory cache? We have to be ready to remove the data, because, again, it could be a cost. What you can do is just like explicit delete or overwrite. Here, like super simple way to apply TTL and make some evictions. Let's go a little bit deeply into TTL, what exactly you can do.

In most cases, in a common situation, I would say you just manage the TTL. We as AWS, we do not recommend you set TTL at the global level for your whole Valkey server, for example. It doesn't make some default TTL value. It's not a good approach. It will be better to have a unique TTL value for your items. It could be relative expire in like 30 seconds, or expire at: for example, as soon as my talk finished, we need to expire all of our cache items related to Viktor talks. The best practice here would be to add random jitter. What do I mean? You remember my example about the hot item? Once we try to lock by one client and everyone else is just waiting for, but it was for one item from Valkey. Imagine the situation that you have some application, that's getting from the source, like from your database, many items almost at the same time and stores them in Valkey.

It means like all of them will expire at the same time. Again, you will have the same problem, that's all of the clients, again, trying to go to your source, to your, for example, RDS or DynamoDB, or whatever you have as a backend for data storage. Random jitter can help to avoid this thundering herd problem. It means your items will expire in a random time a little bit. Differentiate, and this request will not make an impact and create pressure on your RDS, for example. Regarding evictions, also you can apply when your whole memory is full. Just like remove allkeys, for example, one of the solutions. Again, it depends. Are you ready for that or not? Or maybe volatile, it means you remove some part of your items. Here you can apply different algorithms, least recent used, least frequently used, or just super simple TTL to evict the data from your Valkey.

Please be aware that, again, it's like a little bit of a tradeoff, the size of the cache, because it would be the cost of your applications. Do you consume all of these items inside your cache? Because as much as you store, as much you need the memory. Memory, again, 20 times faster compared to the disk, but it's also more expensive compared to the disk. Optimize your size, look at what you exactly needed. One of the best practices, for sure, from us, it would be to apply some autoscaling approach just to find this balance, what's the size of cache that fits better for your application. Maybe you don't need to have half a terabyte of memory for your Valkey system.

Questions and Answers

Losio: You used the example of Dynamo, so proprietary approach to do triggering Lambda. As well in RDS, I know you can on Postgres do some triggering. What's your suggestion when you cannot, when you leverage a third-party database or external database or your own managed database, how do you implement that triggering?

Viktor Vedmich: Out of the box, DynamoDB provided this functionality that when we update some item, we can make a trigger over the Lambda. Look at the engine, what you currently use, like Postgres or whatever, maybe they provide that. Otherwise, I would say apply the logic inside of your application that when you do update something, you have to do it, like, add additional task to some queue, for example, and this queue, it means I have to make an update, like, remove items from the cache, in a super simple way. Again, I would say better to have something inside of the agent of databases, and look at that if you need to apply this lazy loading and write through the cache at the same time.

Participant 1: Always on DynamoDB, would you plan for the deletion of the key. You store in DynamoDB, there is the stream, to have some sort of API that I can connect from the stream using API destinations, so practically not writing code, without using any type of Lambda or target, so directly from stream to EventBridge pipe, whatever, and delete that?

Viktor Vedmich: Currently, no, as I know. As usual, I have to double check with documentation.

Participant 1: You cannot do it, I'm telling you.

Viktor Vedmich: Yes, you cannot do it. You have to kill Lambda currently.

See more presentations with transcripts

Recorded at:

Jun 08, 2026

Viktor Vedmich

InfoQ Software Architects' Newsletter