During the first day of QCon San-Francisco 2023, Tuana Çelik and Thomas Stadelmann from deepset, the creators of the open-source large language model (LLM) framework Haystack, gave a talk on how to build and deploy Retrieval Augmented Generative (RAG) pipelines using their platform.
RAG pipelines are a powerful tool for leveraging LLMs in production. They allow developers to customize how data interacts with LLMs, use various vector databases, and apply the latest retrieval techniques to provide LLMs with relevant context.
Celik began the talk by explaining the structure of the Haystack pipeline, which is built on the principle of composability. She demonstrated how to use Haystack to build a RAG pipeline, including how to use the vector databases of your choice, shape how your data will interact with LLMs, and use the latest retrieval techniques to provide LLMs with the relevant context.
Following Celik's presentation, Stadelmann discussed the challenges of deploying such pipelines to production. He explained how deepset deploys Haystack pipelines on their cloud platform, covering topics such as hosting a pipeline, connecting your pipeline to LLMs hosted in the public cloud, and what it means to run a RAG pipeline end-to-end. This includes prototyping, evaluation, inference, prompt engineering, observability, and more.
The speakers also demonstrated how to use a REST API and containerization to scale query pipelines based on user load. Discussion recommended using vendor-hosted document stores for large-scale deployments, as self-hosting these systems can be complex and resource-intensive.
The talk also covered the importance of optimizing the retriever component of a RAG pipeline, which pre-selects the data that is fed into the LLM. The retriever's performance can significantly impact the quality of the pipeline's output, making it a crucial area for optimization.
In addition to building and deploying RAG pipelines, the speakers also discussed how to monitor and improve them post-deployment. They demonstrated how to use deepset's cloud platform to test pipelines, monitor their performance, collect user feedback, and compare different models and prompts.
The talk concluded with a discussion on hallucination detection, a technique for identifying when an LLM generates information that is not grounded in its input data. The speakers demonstrated how to use a hallucination detection model to score the reliability of an LLM's output and discussed plans to make this model available in Haystack. In discussion, it was noted that the hallucination detection could be task-specific and that many open problems remain.
Developers looking to learn more about deepset’s product can refer to API reference, tutorials, and Github code examples.