InfoQ Software Architects' Newsletter

A monthly overview of things you need to know as an architect or aspiring architect.

Enter your e-mail address

Select your country

We protect your privacy.

InfoQ Homepage News RAG-Powered Copilot Saves Uber 13,000 Engineering Hours

Architecture & Design

RAG-Powered Copilot Saves Uber 13,000 Engineering Hours

This item in japanese

Oct 29, 2024 2 min read

Write for InfoQ

Feed your curiosity. Help 550k+ global
senior developers
each month stay ahead.Get in touch

Uber recently detailed how it built Genie, an AI-powered on-call copilot designed to improve the efficiency of on-call support engineers. Genie leverages Retrieval-Augmented Generation (RAG) to provide accurate real-time responses and significantly enhance the speed and effectiveness of incident response.

Since its launch in September 2023, Genie has significantly impacted Uber's support teams. It has answered over 70,000 questions across 154 Slack channels, saving approximately 13,000 engineering hours with a helpfulness rate of 48.9%, as measured by its users.

Uber's on-call engineers often spend significant time answering repetitive queries or navigating fragmented documentation, making it difficult for users to find answers independently. These circumstances led to long response times and reduced productivity and were the driving motivation for building Genie.

Uber used Retrieval-Augmented Generation (RAG) to power Genie. RAG is an innovative method that combines the strengths of information retrieval systems with generative AI models to produce accurate and relevant responses. It allowed Uber to quickly deploy a solution by leveraging existing knowledge sources, eliminating the need for extensive example data that an AI model fine-tuning would have required.

Genie pulls data from various internal sources, such as Uber's wiki, Stack Overflow, and engineering documents. The information is scraped, converted into vector embeddings using OpenAI models, and stored in Search In Action (SIA), Uber's in-house vector database. Genie only ingests pre-approved data sources with no sensitive data to avoid leaking sensitive information.

Genie's overall architecture (source)

When a user asks a question in Slack, the query is translated into an embedding, which Genie uses to fetch contextually similar data in the vector database. It then inputs this data into the Large Language Model (LLM) to generate an accurate response based on the retrieved information.

Uber has implemented a metrics framework to improve Genie's performance through continuous real-time user feedback. After Genie responds to a question, users can provide feedback by selecting options such as "Resolved," "Helpful," or "Not Relevant."

The flow of user feedback for Genie (source)

This feedback is collected via a Slack plugin and processed using Uber's internal data streaming systems, sending metrics into a Hive table for analysis. The feedback loop allows Uber's teams to track Genie's helpfulness and refine its responses based on real user experiences.

For performance evaluation, Uber designed a custom evaluation pipeline that assesses various metrics, such as hallucination rates and the relevance of responses. This pipeline processes historical data, including Slack metadata, user feedback, and Genie's previous responses. It runs these through a scoring system powered by the LLM, which acts as a judge.

Uber has also incorporated a document evaluation process to ensure the quality of the information Genie retrieves and uses in its responses. The system transforms the scraped knowledge base into a structured format where a row represents each document.

Workflow of the document evaluation app (source)

Genie assesses each document's clarity, accuracy, and usefulness by feeding these documents into the LLM with a custom evaluation prompt. The LLM then returns a score and provides actionable suggestions on improving each document. This process helps maintain a high standard for the underlying documentation, ensuring that Genie's responses remain reliable and effective.

About the Author

Eran Stiller

Eran Stiller is a Chief Software Architect based in Melbourne, Australia. As a seasoned software architect and CTO, Eran designed, implemented and reviewed various software solutions across multiple business domains. Eran has many years of experience in the software development world and a track record of public speaking and community contribution.

Show moreShow less

The InfoQ Newsletter

A round-up of last week’s content on InfoQ sent out every Tuesday. Join a community of over 250,000 senior developers. View an example

We protect your privacy.

InfoQ Software Architects' Newsletter

Login with:

Don't have an InfoQ account?

RAG-Powered Copilot Saves Uber 13,000 Engineering Hours

Write for InfoQ

About the Author

Eran Stiller

This content is in the AI Architecture topic

Related Topics:

Related Editorial

Related Sponsored Content

Popular across InfoQ

The InfoQ Newsletter