Redis, the popular in-memory data structure store, has recently released its enhanced Redis Query Engine. This development comes at a time when vector databases are gaining prominence due to their importance in retrieval-augmented generation (RAG) for GenAI applications.
Redis announced significant improvements to its Query Engine, using multi-threading to enhance query throughput while maintaining low latency. Redis states:
By enabling queries to access the index concurrently, effectively allowing Redis to scale vertically, we enable scaling both the throughput of Redis operations and queries.
The below diagram depicts vertical scaling.
Source: Redis Design Choice - Scale Up and Scale Out
The company emphasizes that this advancement is crucial as data volumes grow to hundreds of millions of documents, where complex queries can potentially limit throughput. Redis claims that responses maintain sub-millisecond latency and that queries average latency under 10 milliseconds.
Redis acknowledges the limitations of its traditional single-threaded architecture for certain operations. They explain that long-running queries on a single thread can cause congestion and reduce overall throughput, particularly for operations like searching data using an inverted index.
The company further elaborates on the complexity of search operations:
Search is not an O(1) time complexity command. Searches usually combine multiple scans of indexes to comply with the several query predicates. Those scans are usually done in logarithmic time complexity O(log(n)) where n is the amount of data points mapped by the index.
Redis asserts that its new multi-threaded approach effectively addresses these challenges, enabling Redis to maintain its high performance for simple operations while significantly improving throughput for compute-intensive operations like vector similarity searches.
Redis highlights, "Scaling search efficiently requires combining the distribution of data loads horizontally (going out) and multi-threading vertically, enabling concurrency on accessing the index (going up)."
Source: Single Shard Redis Multi-threaded Query Engine Main Thread and a Thread Pool
The diagram above depicts the new architecture, with multiple queries executed, each on a separate thread . Redis outlines a three-step process:
The query context (planning) is prepared on the main thread and queued on a shared queue. From here, threads consume the queue and execute the query pipeline, concurrently to other threads. This allows us to execute multiple concurrent queries while keeping the main thread alive to handle more incoming requests, such as other Redis commands, or prepare and queue additional queries. Once finished, the query results are sent back to the main thread.
Redis claims this new architecture allows them to maintain the responsiveness of the main thread for standard Redis operations while simultaneously processing multiple complex queries, thus improving overall system throughput and scalability.
Redis conducted extensive benchmarks to validate the performance of its Query Engine, comparing it against three segments of vector-database providers: pure vector databases, general-purpose databases with vector capabilities, and fully managed in-memory Redis cloud service providers (CSP). Redis claims its upgraded query engine outperforms pure vector databases in both speed and scalability, while significantly surpassing general-purpose databases and fully managed in-memory Redis CSP in overall performance.
The vector-database market has seen a significant surge in recent years, with numerous products flooding the space. This proliferation has created a challenging environment for new entrants and users alike. Industry experts point out that the market is already saturated with vector-database options, making it difficult for new products to differentiate themselves and find a unique value proposition.
Doug Turnbull, a principal engineer at Reddit, notes,
Yet in vector search, we have dozens upon dozens of options. As a 'customer' of such options, the field becomes overwhelming…. Vector retrieval increasingly isn't the problem. The hard problems of solving real-world retrieval are not related to just getting vectors, it's everything around it.
This perspective underscores the need for comprehensive solutions that address the broader challenges in AI-driven data retrieval.
The new Redis Query Engine claims a 16-fold increase in query throughput compared to its predecessor. In particular, the query engine caters to the needs of GenAI applications, such as chatbots that rely on real-time RAG and must quickly process multiple steps while retrieving data from vector databases.
Paul Buchheit, creator of Gmail, introduced "The 100ms Rule," which states that every interaction should occur in less than 100 milliseconds to feel instantaneous to users.
A breakdown of latency boundaries in RAG architectures reveals: network round trip, LLM processing, GenAI app operations, and vector database queries resulting in an average end-to-end response time of 1,513 ms (1.5 seconds). To address this challenge, developers must rethink their data architecture to build real-time GenAI applications that approach the 100ms Rule. Real-time RAG is crucial for maintaining application speed while leveraging AI capabilities, ensuring that users experience near-instantaneous interactions and remain engaged with the application.
Vectera’s Ofer Mendelevitch reminds us that while vector database performance is crucial, it's part of a larger technological landscape in AI application development.
It's true that RAG is currently the most popular methodology for building trusted LLM-based applications with your own data, and you do need a strong semantic search capability as part of your overall retrieval capability (the R in RAG), but a vector database is just one piece of that overall stack, and likely not even the most important one.
Yingjun Wu, founder of RisingWave Labs, offers a complementary view on the development of vector databases:
Instead of investing in new vector-database projects, it would be more advisable to concentrate on existing databases and explore opportunities to enhance them with vector engines, making them even more robust and powerful.
Redis's approach of enhancing its existing infrastructure aligns with this perspective, potentially offering a more integrated and efficient solution for developers.
The comprehensive benchmarking process addressed both ingestion and search workloads. For ingestion, Redis measured the time taken to ingest and index data using the hierarchical navigable small-world (HNSW) algorithm, approximate nearest neighbor (ANN) search. For querying, they focused on pure k-nearest neighbors (k-NN) searches, measuring throughput in requests per second (RPS) and average client-side latency, including round-trip time (RTT).
Redis benchmarked results against gist-960-euclidean, glove-100-angular, deep-image-96-angular, dbpedia-openai-1M-angular datasets with different vector dimensionalities and distance functions to ensure comprehensive testing. The simulation environment employed industry-standard benchmarking tools, including Qdrant's vector-db-benchmark to deliver reliable and reproducible results.
While Redis reports impressive performance in their benchmarks, it's important to consider perspectives from other industry players. A comparative study conducted by one of Redis's competitors offers a different view on Redis's capabilities. Additionally, ScaleGrid, a Database-as-a-Service (DBaaS) managed platform provider, has shared their insights on Redis.
Redis has made the new Query Engine immediately available in Redis Software, with plans to release it for Redis Cloud in the fall. To see the Redis vector database in action using LangChain framework, checkout this demo, a practical glimpse into how these technologies can be applied to real-world scenarios. For more on the world of vector databases, try this PostgresML presentation and an InfoQ podcast featuring Edo Liberty, founder and CEO of Pinecone vector database, where he shares his perspective on these technologies in RAG applications.