BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage News Serverless Challenges in Hybrid Environments

Serverless Challenges in Hybrid Environments

Sam Newman, independent consultant and author of the book "Building Microservices", talked at the Velocity conference in London on some of the challenges faced when hybrid systems rely on both serverless architectures and traditional infrastructure. In particular, Newman discussed how serverless changes our notion of resiliency and how the two paradigms clash at times of high load in the system.

Resiliency in traditional server systems relies on state (for example a database connection pool to throttle and control the amount of requests hitting the database at any particular point in time). Stability in this type of system is kept by controlling incoming load and balancing it across multiple instances. But with ephemeral functions (lambdas) there is no place to store the controlling state, therefore there needs to be a parity between the way functions auto-scale with load and the way the backend databases scale as well.

Auto-scaling cloud databases such as Amazon's DynamoDB or Google's Bigtable fit well in the serverless paradigm, but Newman pointed out that the majority of systems rely on traditional databases, thus simply "bolting-on" serverless functions on a legacy system can have drastic consequences. Newman highlighted the fact that even serverless poster child Bustle faced unexpected challenges. Although they explicitly set a hard constraint of 1000 lambda connections to any one of their Redis node (know to be able to handle 10 times that number of connections), they still saw failing nodes because lambda functions seem to keep the connection alive up to three minutes after they have been stopped (based on anedoctal evidence). Bustle engineering had to delve into Redis inner workings to fix this issue (forcing those zombie connections to time out faster), which highlights the mismatch between how serverless and non serverless handle load and resiliency, Newman argued.

Another challenge Newman mentioned is the fact that circuit breakers, typically used in microservices to gracefully handle failure downstream - effectively shedding load thus making the overall system more resilient - rely on maintaining state across multiple requests. For instance, to be able to close the circuit (self-heal) once the downstream service has shown to be stable again.

Newman said service meshes such as Istio or Linkerd might help with some of these issues, acting as persistent stateful proxies that can co-ordinate load between microservice functions.

Finally, from a security point of view, functions are running containers and thus are vulnerable to exploits where a container breaks out into another one running on the same host. But this becomes quite hard since the container where a function is run lives for a short period of time, it cannot be exploited after the function is terminated. Security experts such as Guy Podjarny warn, however, that serverless moves security concerns to the application level, and a chain of function calls can be exploited if not secured correctly.

Newman also mentioned the concern many people have around lock-in when selecting a particular cloud vendor's implementation of Function-as-a-Service (FaaS), an issue that was covered in a recent InfoQ eMag. Moving the discussion from lock-in to understanding (and accepting) the trade-off between going faster (with less cognitive load) and the cost of migration (which is decreasing as feature sets become similar across different FaaS providers) is the key to handle this concern.

BT