InfoQ Software Architects' Newsletter

A monthly overview of things you need to know as an architect or aspiring architect.

Enter your e-mail address

Select your country

We protect your privacy.

InfoQ Homepage News Vercel Fluid: a New Compute Model and an Alternative to Serverless?

Cloud

Vercel Fluid: a New Compute Model and an Alternative to Serverless?

This item in japanese

Mar 09, 2025 2 min read

Write for InfoQ

Feed your curiosity. Help 550k+ global
senior developers
each month stay ahead.Get in touch

Vercel has recently introduced Vercel Fluid, an elastic compute model that allows a single worker to handle multiple requests, similar to a traditional server, while preserving the elasticity of serverless. By scaling functions before instances, Fluid maximizes available compute time, optimizing compute footprint and resource efficiency for long-running tasks and AI inference.

According to the development team, functions with Fluid compute prioritize existing resources before creating new instances, eliminating hard scaling limits and leveraging warm compute for more efficient scaling. This allows shifting to a many-to-one model that can handle tens of thousands of concurrent invocations on a single function.

Vercel Fluid

Source: Vercel blog

Vercel claims the new model offers several benefits including cold start prevention, efficient auto-scaling, horizontal & vertical concurrency, and optimized I/O efficiency, all with a pay-as-you-go pricing model. Jones Zachariah Noel N, senior developer advocate at Freshworks and AWS Serverless Hero, questions if Fluid Compute is the next big thing:

Vercel is bringing the best of the server-based approach for a cost efficiency along with a runtime power, best of a Developer Experience (DevX) and security of Serverless. As Vercel calls it - the power of Servers in the Serverless way, which additionally addresses the Cold Start problem and having a function ready for execution.

Reducing the need to spin up a function for each incoming request, in-function concurrency reduces both the chance of paying for idle compute time and the likelihood of hitting a cold start. Tackling the problem of idle compute time is especially important in scenarios where the downstream service is a slow responder, either by nature (LLMs) or because of performance issues.

Claiming to optimize performance and cost, the new option relies on compute triggers only when needed, with real-time scaling from zero to peak traffic and existing resources being used before scaling new ones. Vercel Fluid is designed for tasks like video streaming and post-response processing that might have high response times but low spikes of CPU usage. Malte Ubl, Vercel CTO, explains on Hacker News the main difference between Vercel Fluid and traditional serverless approaches:

The big difference is how the microvm is utilized. Lambda reserves the entire VM to handle a request end to end. Fluid can use a VM for multiple concurrent requests. Since most workloads are often idle waiting for IO, this ends up being much more efficient.

To support the new functionality, Fluid Compute introduces the waitUntil API to handle tasks after the HTTP response is sent, providing an observability dashboard that includes metrics such as execution time, concurrency levels, cold start occurrences, and overall compute utilization. In "Vercel’s Fluid Compute and what it means for AWS Lambda", Andreas Casen writes:

Whether AWS Lambda has yet to figure out in-function concurrency or already has and simply chose to not "pass the savings on", Fluid Compute provides a competitive edge in terms of cost efficiency, and it’s hard for me to imagine Lambda won’t keep up. The ball is now in AWS’s court, will they respond?

The new option has been discussed in a popular Reddit thread, where a user warns:

As someone that runs a modest SaaS business with the frontend on Vercel, the pricing model changes are just another source of fatigue these days. I understand Vercel iterating their revenue model, but as an end user another thing that is promising "reduce your compute costs" via some vague concept called "fluid compute" is honestly just annoying.

Functions are billed according to GB-hours, determined by the memory allocated to the function and the duration of the execution.

About the Author

Renato Losio

Renato has extensive experience as a cloud architect, tech lead, and cloud services specialist. Currently, he lives in Berlin and works remotely as a principal cloud architect. His primary areas of interest include cloud services and relational databases. He is an editor at InfoQ and a recognized AWS Data Hero. You can connect with him on LinkedIn.

Show moreShow less

This content is in the Cloud topic

The InfoQ Newsletter

A round-up of last week’s content on InfoQ sent out every Tuesday. Join a community of over 250,000 senior developers. View an example

We protect your privacy.

InfoQ Software Architects' Newsletter

Login with:

Don't have an InfoQ account?

Vercel Fluid: a New Compute Model and an Alternative to Serverless?

Write for InfoQ

About the Author

Renato Losio

This content is in the Cloud topic

Related Topics:

Popular in Cloud

Related Sponsored Content

Popular across InfoQ

The InfoQ Newsletter