InfoQ Software Architects' Newsletter

A monthly overview of things you need to know as an architect or aspiring architect.

Enter your e-mail address

Select your country

We protect your privacy.

InfoQ Homepage News Uber Creates GenAI Gateway Mirroring OpenAI API to Support over 60 LLM Use Cases

Architecture & Design

Uber Creates GenAI Gateway Mirroring OpenAI API to Support over 60 LLM Use Cases

Sep 24, 2024 2 min read

Write for InfoQ

Feed your curiosity. Help 550k+ global
senior developers
each month stay ahead.Get in touch

Uber created a unified platform for serving large language models (LLMs) from external vendors and self-hosted ones and opted to mirror OpenAI API to help with internal adoption. GenAI Gateway provides a consistent and efficient interface and serves over 60 distinct LLM use cases across many areas.

The company was one of the early adopters of large language models (LLMs), with several teams working on incorporating AI-driven functionality into various domains, from process automation to customer support and content generation. However, disparate integration efforts resulted in repeated work and inconsistencies in the approach. In response to these challenges, Uber decided to centralize the serving of LLM models in a single service: the GenAI Gateway.

Tse-Chi Wang and Roopansh Bansal, senior software engineers at Uber, explain the rationale for creating the gateway:

The GenAI Gateway is designed to simplify the integration process for teams looking to leverage LLMs in their projects. Its easy onboarding process reduces the effort required by teams, providing a clear and straightforward path to harness the power of LLMs. In addition, a standardized review process, managed by the Engineering Security team, reviews use cases against Uber’s data handling standard before use cases are granted access to the gateway.

The team opted to adopt the OpenAI API for the gateway due to the wide adoption and availability of open-source libraries like LangChain and LlamaIndex. Mirroring a well-known API streamlines the onboarding process and extends the gateway's reach.

GenAI Gateway is a Go service that incorporates the serving layer, combining external (OpenAI, Vertex AI) and internal LLMs and many generic capabilities, such as authentication and account management, caching, and observability/monitoring.

Architecture of GenAI Gateway (Source: Uber Engineering Blog)

The gateway supports the personal identifiable information (PII) reduction, which is both important and challenging in the context of LLMs. Uber wanted to ensure that PII data was anonymized before forwarding requests to third-party vendors to avoid the risk of exposing sensitive data. On the other hand, reducting PII can lead to problems where requests lose essential context information and prevent LLMs from providing useful responses. Furthermore, data reduction is problematic for LLM caching and retrieval augmented generation (RAG). The team is looking at addressing these challenges by encouraging using Uber-hosted LLMs or considering relying on security assurances from third-party vendors.

The authors included a case study covering the summarization of chats for customer support agents to improve their operational efficiency by reducing the time they spend addressing user queries. Using LLMs for this use case resulted in 97% of generated summaries being considered useful by agents and a six-second reduction in user query handling time. The solution currently generates around 20 million summaries per week, but the team plans to expand to more regions and contact types.

Integration of GenAI Gateway to Support Specific Use Case (Source: Uber Engineering Blog)

The team learned a great deal from developing and operating the GenAI Gateway and is planning to work on further enhancements, including intelligent LLM caching mechanisms, better fallback logic, hallucination detection, and safety and policy guardrails.

About the Author

Rafal Gancarz

Rafał is an experienced technology leader and expert. He's currently helping Starbucks make its Commerce Platform scalable, resilient and cost-effective. Previously, Rafał has been involved in designing and building large-scale, distributed and cloud-based systems for Cisco, Accenture, Capita, ICE, Callsign and others. His interests span architecture & design, continuous delivery, observability and operability, as well as sociotechnical and organisational aspects of software delivery.

Show moreShow less

The InfoQ Newsletter

A round-up of last week’s content on InfoQ sent out every Tuesday. Join a community of over 250,000 senior developers. View an example

We protect your privacy.

InfoQ Software Architects' Newsletter

Login with:

Don't have an InfoQ account?

Uber Creates GenAI Gateway Mirroring OpenAI API to Support over 60 LLM Use Cases

Write for InfoQ

About the Author

Rafal Gancarz

This content is in the Large language models topic

Related Topics:

Related Editorial

Related Sponsored Content

Popular across InfoQ

The InfoQ Newsletter