InfoQ Software Architects' Newsletter

A monthly overview of things you need to know as an architect or aspiring architect.

Enter your e-mail address

Select your country

We protect your privacy.

InfoQ Homepage News Figma Moves from ECS to Kubernetes to Benefit from the CNCF Ecosystem and Reduce Costs

Architecture & Design

Figma Moves from ECS to Kubernetes to Benefit from the CNCF Ecosystem and Reduce Costs

This item in japanese

Sep 02, 2024 2 min read

Write for InfoQ

Feed your curiosity. Help 550k+ global
senior developers
each month stay ahead.Get in touch

Figma migrated its compute platform from AWS ECS to Kubernetes (EKS) in less than 12 months with minimal customer impact. The company decided to adopt Kubernetes to run its containerized workloads primarily to take advantage of the large ecosystem supported by the CNCF. Additionally, the move was dictated by pursuing cost savings, improved developer experience, and increased resiliency.

Figma moved to run application services in containers and adopted Elastic Container Service (ECS) as its container orchestration platform by early 2023. Using ECS allowed the company to quickly roll out containerized workloads, but since then, engineers have run into problems with certain limitations of using ECS, mainly the lack of support for StatefulSets, Helm charts, or the ability to easily run OSS software like Temporal.

Moreover, the company recognized it was missing out on the wide range of capabilities offered for Kubernetes within the CNCF community, including advanced autoscaling using Keda or Karpenter, service mesh using Istio/Envoy, and numerous other tools and features. The organization also considered the substantial engineering effort required to customize ECS for its needs and the availability of engineers experienced with Kubernetes on the job market.

Kubernetes Migration Timeline (Source: Figma Engineering Blog)

After deciding to switch to Kubernetes (EKS), the team agreed on the scope of the migration, focusing on minimizing changes required to services to avoid delays and risks. Despite limiting the project's scope, the company wanted to include specific improvements, like simplified resource definitions to improve developer experience and improved reliability by splitting the deployment into three Kubernetes clusters to avoid the impact of bugs and operator errors.

Ian VonSeggern, software engineering manager at Figma, discusses the cost optimization goals of the migration project:

We didn’t want to tackle too much complex cost-efficiency work as part of this migration, with one exception: We decided to support node auto-scaling out of the gate. For our ECS on EC2 services, we simply over-provisioned our services so we had enough machines to surge up during a deploy. Since this was an expensive setup, we decided to add this additional scope to the migration because we were able to save a significant amount of money for relatively low work. We used the open-source CNCF project Karpenter to scale up and down nodes dynamically for us based on demand.

To ensure a successful project outcome, Figma created a well-staffed team to drive the migration effort and engage with the broader organization to get their buy-in. The engineers prepared for the production rollout by conducting load testing of the Kubernetes setup to avoid surprises, implementing an incremental switchover mechanism using weighted DNS entries, and deploying services into the staging Kubernetes cluster early in the process to iron out any issues. The compute platform team has worked with service owners to provide a golden path and ensure consistency and ease of maintenance.

The initial migration took less than 12 months, and after migrating core services, the team started looking at follow-up activities like introducing Keda-based autoscaling. Additionally, based on user feedback, engineers simplified developer tooling to work with three Kubernetes clusters and new fine-grained RBAC roles.

About the Author

Rafal Gancarz

Rafał is an experienced technology leader and expert. He's currently helping Starbucks make its Commerce Platform scalable, resilient and cost-effective. Previously, Rafał has been involved in designing and building large-scale, distributed and cloud-based systems for Cisco, Accenture, Capita, ICE, Callsign and others. His interests span architecture & design, continuous delivery, observability and operability, as well as sociotechnical and organisational aspects of software delivery.

Show moreShow less

The InfoQ Newsletter

A round-up of last week’s content on InfoQ sent out every Tuesday. Join a community of over 250,000 senior developers. View an example

We protect your privacy.

InfoQ Software Architects' Newsletter

Login with:

Don't have an InfoQ account?

Figma Moves from ECS to Kubernetes to Benefit from the CNCF Ecosystem and Reduce Costs

Write for InfoQ

About the Author

Rafal Gancarz

This content is in the Kubernetes topic

Related Topics:

Related Editorial

Related Sponsored Content

Popular across InfoQ

The InfoQ Newsletter