Designing a Multi-Agent System for Engineering Support at Scale: a Case Study from Grab

Grab’s Analytics Data Warehouse (ADW) team has introduced a multi-agent AI system to automate engineering support workflows across its large-scale data platform, aiming to reduce repetitive operational work and improve resolution efficiency. The system is designed to handle internal engineering requests spanning data warehouse troubleshooting, SQL debugging, and platform support, while shifting engineers toward higher-value development work.

The ADW platform supports more than 1,000 internal users and manages over 15,000 tables, serving as a core analytics infrastructure component within Grab. As usage grew, the engineering team observed that a significant portion of operational effort was being consumed by repetitive support tasks and ad hoc investigations, limiting time available for platform improvement and system design work.

Sneh Agrawal, head of analytics @ Grab, highlighted in a LinkedIn post:

Grab’s Central Data Team is leveraging a multi-agent system to automate repetitive operational work, reclaiming hundreds of engineering hours each month. This shift is unlocking critical engineering bandwidth and enabling a transition from reactive firefighting to higher-value system building.

To address this, the team implemented a multi-agent architecture that separates incoming engineering requests into two primary workflows: investigation and enhancement. Investigation workflows are designed for diagnostic tasks such as query analysis, log retrieval, schema lookup, and issue summarization. Enhancement workflows focus on generating actionable outputs, including code changes, SQL fixes, and automated merge requests for review.

Multi-agent architecture tech stack (Source: Grab Tech Blog Post)

The system is orchestrated using a LangGraph-based workflow engine combined with FastAPI services that coordinate routing, tool execution, and state management across agents. Requests are first classified and then routed to specialized agents responsible for tasks such as context retrieval, code search, or solution generation. Each agent operates with constrained responsibilities to reduce ambiguity and improve the predictability of outputs.

Agent workflows, using a Supervisor that controls communication flow and task delegation (Source: Grab Tech Blog Post)

According to Grab engineers,

The separation of investigation and enhancement paths helped us reduce complexity in agent reasoning and improved reliability in production workflows.

A key architectural decision was the consolidation of the tool ecosystem. The system initially exposed more than 30 internal tools across data access, logging, and code systems. This was later reduced to a smaller, curated toolset to improve maintainability and reduce unpredictable tool selection by agents. The tool layer includes controlled SQL execution, metadata access, log retrieval systems, and integration with Git-based workflows for change management.

Safety and governance were integrated into the system design. SQL execution is constrained through validation layers, and sensitive data handling includes mechanisms for detecting and mitigating exposure risks. In addition, all enhancement workflows that produce code changes require human-in-the-loop review before deployment, ensuring that automated outputs remain subject to engineering oversight.

Context management emerged as a significant technical challenge. Multi-step agent reasoning required maintaining relevant state across interactions while operating within token constraints. The system addresses this through structured context compression and selective retrieval strategies, allowing agents to retain necessary information without exceeding operational limits.

The impact of the system has been observed in reduced time spent on routine engineering support tasks and faster resolution cycles for common issues. While exact performance metrics were not disclosed, the team noted a shift in engineering effort away from firefighting and toward platform engineering and system improvement.

About the Author

Leela Kumili

Show moreShow less

InfoQ Software Architects' Newsletter

Follow us on

About the Author

Leela Kumili

Rate this Article

This content is in the Agents topic

Related Topics:

Related Editorial

Related Sponsors

Popular across InfoQ

The InfoQ Newsletter