Cilium is open source software for transparently securing the network connectivity between application services deployed using Linux container management platforms like Docker and Kubernetes. Cilium 1.0.0-rc4 has recently been released, which includes: the Cloud Native Computing Foundation (CNCF)-hosted Envoy configured as the default HTTP/gRPC proxy; the addition of a simple health overview for connectivity and other errors; and an improved scalable kvstore interaction layer.
Microservices applications tend to be highly dynamic, and this presents both a challenge and an opportunity in terms of securing connectivity between microservices. Modern approaches to overcoming this issue have coalesced around the CNCF-hosted Container Network Interface (CNI) and the increasingly popular "service mesh" technologies, such as Istio and Conduit. According to the Cilium documentation, traditional Linux network security approaches (such as iptables) filter on IP address and TCP/UDP ports. However, the highly volatile life cycle of containers and IP addresses cause these approaches to struggle to scale alongside the application as the large number of load balancing tables and access control lists must be updated continually.
Cilium attempts to address the issue with scaling by utilising a (relatively) new technology called Berkeley Packet Filter (BPF). BPF is a Linux kernel bytecode interpreter that was originally introduced to filter network packets, as seen in tcpdump and socket filters. It has been extended with additional data structures such as hash tables and arrays as well as additional actions to support packet mangling, forwarding, encapsulation, etc. An in-kernel verifier ensures that BPF programs are safe to run and a JIT compiler converts the bytecode to CPU architecture specific instructions for native execution efficiency. For readers keen to explore BPF in further detail, performance Guru Brendan Gregg has written and talked extensively about "Linux BPF Superpowers".
A deployment of Cilium consists of the following components running on each Linux container node in the container cluster:
- Cilium Agent (Daemon): Userspace daemon written in Golang that interacts with the container runtime and orchestration systems such as Kubernetes via Plugins to setup networking and security for containers running on the local server. Provides an API for configuring network security policies, extracting network visibility data, etc.
- Cilium CLI Client: Simple CLI client for communicating with the local Cilium Agent, for example, to configure network security or visibility policies.
- Linux Kernel BPF: A datapath component that utilizes the BPF functionality that is an integrated capability of the Linux kernel to accept compiled bytecode that is run at various hook / trace points within the kernel. Cilium compiles BPF programs and has the kernel run them at key points in the network stack to have visibility and control over all network traffic in / out of all containers.
- Container Platform Network Plugin: Each container platform (e.g., Docker, Kubernetes) has its own plugin model for how external networking platforms integrate. In the case of Docker, each Linux node runs a process (cilium-docker) that handles each Docker libnetwork call and passes data / requests on to the main Cilium Agent. Cilum also utilised a set of userspace proxies -- one of them being Envoy -- to provide application protocol level filtering while an in-kernel version of this is being built.
Cilium provides the ability to secure modern application protocols such as REST/HTTP, gRPC and Kafka. Traditional firewalls typically operate at Layer 3 and 4, and a protocol running on a particular port is either completely trusted or blocked entirely. Cilium provides the ability to filter on individual application protocol requests such as:
- Allow all HTTP requests with method GET and path /public/.*. Deny all other requests.
- Allow service1 to produce on Kafka topic topic1 and service2 to consume on topic1. Reject all other Kafka messages.
- Require the HTTP header X-Token: [0-9]+ to be present in all REST calls.
The section Layer 7 Policy of the Cilium documentation contains the latest list of supported protocols and examples on how to use it. As BPF runs inside the Linux kernel, Cilium security policies can be applied and updated (in theory) without any changes to the application code or container configuration.
Cilium provides a comprehensive approach to implementing network security, and the core concepts are built around the assignment of a security identity to groups of application containers which share identical security policies. The identity is then associated with all network packets emitted by the application containers, allowing to validate the identity at the receiving node. Security identity management is performed using a key-value store.
Cilium also provides many other networking feature, including: a simple flat Layer 3 network with the ability to span multiple clusters connects all application containers -- this means that each host can allocate IPs without any coordination between hosts; distributed load balancing for traffic between application containers and to external services. The load balancing is implemented via BPF using efficient hash tables allowing for "almost unlimited scale"; and extensive observabiilty such as event monitoring with metadata, policy decision tracing, and metrics export via Prometheus.
At the recent KubeCon NA conference there was much discussion on "service meshes" (particularly around the Istio control plane), and accordingly the Cilium community have written a guide to how their technology -- and the promise of BPF in general -- can complement Istio. Istio itself is using Envoy for the implementation of its data plane, and the proxy is run as a sidecar configuration inside of the application pod. Cilium runs Envoy outside of the application pod and configures separate listeners for individual pods. The Cilium community has suggested that:
There is no right or wrong in this model, both have advantages and disadvantages on a variety of aspects including operational complexity, security, resource accounting, [and] total footprint. Cilium will likely allow running either model in the future.
Additional information on Cilium can be found at the project's website, and questions can be asked via the Cilium Slack.