Vercel has recently introduced Vercel Fluid, an elastic compute model that allows a single worker to handle multiple requests, similar to a traditional server, while preserving the elasticity of serverless. By scaling functions before instances, Fluid maximizes available compute time, optimizing compute footprint and resource efficiency for long-running tasks and AI inference.
According to the development team, functions with Fluid compute prioritize existing resources before creating new instances, eliminating hard scaling limits and leveraging warm compute for more efficient scaling. This allows shifting to a many-to-one model that can handle tens of thousands of concurrent invocations on a single function.
Source: Vercel blog
Vercel claims the new model offers several benefits including cold start prevention, efficient auto-scaling, horizontal & vertical concurrency, and optimized I/O efficiency, all with a pay-as-you-go pricing model. Jones Zachariah Noel N, senior developer advocate at Freshworks and AWS Serverless Hero, questions if Fluid Compute is the next big thing:
Vercel is bringing the best of the server-based approach for a cost efficiency along with a runtime power, best of a Developer Experience (DevX) and security of Serverless. As Vercel calls it - the power of Servers in the Serverless way, which additionally addresses the Cold Start problem and having a function ready for execution.
Reducing the need to spin up a function for each incoming request, in-function concurrency reduces both the chance of paying for idle compute time and the likelihood of hitting a cold start. Tackling the problem of idle compute time is especially important in scenarios where the downstream service is a slow responder, either by nature (LLMs) or because of performance issues.
Claiming to optimize performance and cost, the new option relies on compute triggers only when needed, with real-time scaling from zero to peak traffic and existing resources being used before scaling new ones. Vercel Fluid is designed for tasks like video streaming and post-response processing that might have high response times but low spikes of CPU usage. Malte Ubl, Vercel CTO, explains on Hacker News the main difference between Vercel Fluid and traditional serverless approaches:
The big difference is how the microvm is utilized. Lambda reserves the entire VM to handle a request end to end. Fluid can use a VM for multiple concurrent requests. Since most workloads are often idle waiting for IO, this ends up being much more efficient.
To support the new functionality, Fluid Compute introduces the waitUntil
API to handle tasks after the HTTP response is sent, providing an observability dashboard that includes metrics such as execution time, concurrency levels, cold start occurrences, and overall compute utilization. In "Vercel’s Fluid Compute and what it means for AWS Lambda", Andreas Casen writes:
Whether AWS Lambda has yet to figure out in-function concurrency or already has and simply chose to not "pass the savings on", Fluid Compute provides a competitive edge in terms of cost efficiency, and it’s hard for me to imagine Lambda won’t keep up. The ball is now in AWS’s court, will they respond?
The new option has been discussed in a popular Reddit thread, where a user warns:
As someone that runs a modest SaaS business with the frontend on Vercel, the pricing model changes are just another source of fatigue these days. I understand Vercel iterating their revenue model, but as an end user another thing that is promising "reduce your compute costs" via some vague concept called "fluid compute" is honestly just annoying.
Functions are billed according to GB-hours, determined by the memory allocated to the function and the duration of the execution.