Pinecone recently announced the public preview of its new serverless vector database, designed to reduce infrastructure management costs while improving the accuracy of generative AI applications.
According to the vector database specialist, the segregation of reads, writes, and storage significantly reduces expenses for workloads of all types and sizes. The multi-tenant compute layer facilitates on-demand retrieval using new indexing and retrieval algorithms, enabling memory-efficient vector search from blob storage.
Similarly to the pod-based indexes, Pinecone serverless supports live index updates, metadata filtering, hybrid search, and namespaces. Discussing the performances of the new option, Edo Liberty, founder and CEO of Pinecone, states:
Performance is also preserved. In fact, for warm namespaces, serverless indexes provide significantly lower latencies compared to pod-based indexes, with roughly the same level of recall. Warm namespaces are namespaces that receive queries regularly and, as a result, are cached locally in the multi-tenant workers. Cold-start queries will have higher latencies.
According to Pinecone, Retrieval Augmented Generation (RAG) combines Large Language Models (LLMs) with a vector database to enhance the LLM by incorporating knowledge for three distinct purposes: RAG can deliver recent information, leverage out-of-domain knowledge, and address hallucination.
In the article "Reimagining the vector database to enable knowledgeable AI", Ram Sriharsha, VP of engineering at Pinecone, describes why and how the team rebuilt Pinecone and discusses why vector databases are helpful:
LLMs are prone to hallucination. Researchers have shown RAG reduces the likelihood of hallucination even on data that the model was trained on. Moreover, RAG systems can cite the original sources of their information, allowing users to verify these sources or even use another model to verify that facts in answers have supported sources.
Jeremy Daly, CEO and founder of Ampt, comments instead:
This is touted as "a "breakthrough" to curb AI hallucinations", but given that other major databases are adding vector capabilities as well, analysts say they may see few takers.
Pinecone is not the only player in the market supporting vectors in serverless deployments, with other database and data platform providers offering serverless databases with vector support, including MongoDB and Snowflake.
While the company asserts that the majority of users will experience reduced costs with Pinecone Serverless as opposed to Pinecone pod-based indexes, it acknowledges that the current pricing is not fully optimized for high-throughput applications; there is a chance of reads being throttled, and pricing updates for high-throughput use cases are expected in the future.
The new option has been well received by the community, with developers asking for higher read limits and options to move a workload from pods to serverless. Separately, the company released the Pinecone AWS Reference Architecture with Pulumi to deploy a distributed system that uses Pinecone Serverless to perform semantic search.
The initial preview of the serverless option is available in one AWS region only (us-west-2). However, Pinecone expects to introduce support in the future for additional regions as well as Azure and GCP.
Pinecone Serveless is available in public preview, at $0.33 USD per GB per month for storage, $8.25 USD per million read units, and $2 USD per million write units, with $100 USD usage credits to try the service.