BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage News Kubernetes IPv4/IPv6 Dual Stack Q&A with Khaled (Kal) Henidak of Microsoft & Tim Hockin of Google

Kubernetes IPv4/IPv6 Dual Stack Q&A with Khaled (Kal) Henidak of Microsoft & Tim Hockin of Google

This item in japanese

IPv4/IPv6 addressing, or simply referred to as dual-stack, which has been available in Kubernetes releases went GA earlier this month with the release of Kubernetes 1.23.

This work, as outlined in the Kubernetes Enhancement Proposal(KEP-563) has been years in the making, touching almost every Kubernetes component.

InfoQ caught up with Khaled (Kal) Henidak of Microsoft and Tim Hockin of Google, designers of the dual-stack and implementers of the same in concert with the SIG-NETWORK community. They talk about the travails of the dual-stack -- the motivation, technical details, long road and the roadmap, how it affects the different Kubernetes components and the ripple effect it has on cloud Kubernetes providers.

InfoQ: What does ipv6 and dual stack support for Kubernetes really mean?

Khaled (Kal) Henidak: In order to answer that we will have to go a little back in time. Since its inception, Kubernetes introduced the concept of assigning a unique network address (IP) to each and every workload instance (Pod) running on it. This concept was not new then. It was the standard operating practice in virtual machine environments. However, it was novel in environments such as Kubernetes. This unlocked use cases such as IP level addressable workloads, discovery via the Service API, elaborate ingress control, network security policies, and so forth.

While the above was happening, the industry was progressing fast with IoT, edge compute, and 5G (which only runs on IPv6). This rapid progress further depleted the few remaining IPv4 addresses. Organizations who were looking into using IPv6 addresses to avoid these problems or just looking to use IPv6 for its performance were in a challenging spot. They were unable to connect their IPv6 clusters and workloads to existing systems which most certainly run only on IPv4. The only solution available then was some sort of address translation which comes with its own bag of performance penalty and operational complexities.

Kubernetes running on IPv4/IPv6 dual stack networking allows these workloads to access both IPv4 and IPv6 endpoints natively without additional complexities or performance penalties. Cluster operators can also choose to expose external endpoints using one or both of the address families in any order that fits their requirements. Kubernetes does not make any strong assumption about the network it runs on. For example, users who are running on a small IPv4 address space can make a choice to enable dual stack on a subset of their cluster nodes and have the rest running on IPv6, which traditionally has a larger available address space.

I foresee a future where the majority of the workloads move to IPv6 only networks and clusters will run on IPv6 only networks. But until then dual stack will keep the two worlds old and new connected.

Tim Hockin: Simple answer: Kubernetes can now use IPv4 and IPv6 at the same time. Pods and Services can choose which they prefer.

More complete answer: We extended a bunch of different APIs (Pods, Services, Nodes) to be aware of multiple IP addresses rather than just one. We updated a bunch of components (apiserver, controller-manager, kube-proxy, kubelet) to use those extended APIs so that users can configure their clusters to provide both an IPv4 and an IPv6 address to Pods and Services. We tried REALLY hard to make it as transparent and low-risk as possible. If you don’t need dual-stack, you should not be impacted by it, but it’s easy to turn on if you do need it. If you have existing Pods and Services, we will not automatically convert them for you, but you can easily convert them yourself. If you have controllers which only understand single-stack, they will continue to work as before.

Now that we can support dual-stack, some things can get easier. For example, IPv4 is limited to 32 bits (4 billion IPs). The “private” IPv4 address ranges only allow for a few million IPs, and the usual CIDR based routing means that space needs to be allocated in large blocks. IPv6 has 128 bits (trillions of trillions of IPs), which makes it less perilous to pre-allocate large blocks and will hopefully make it easier to start using Kubernetes in “flat” IP modes.

InfoQ: Can you provide some technical details of the implementation that non-networking geeks can grok?

Henidak: The story starts with Pods. A Pod in Kubernetes is represented by an API object. That object has a status field which among other things reflects the network address IP assigned to it. That IP is reported by kubelet which reads via Container Runtime Interface (CRI) implementation on the node. CRI itself sets up the networking by calling Container Network Interface (CNI) which is a separate component responsible for setting up networking including the dual stack for a Pod. We have extended this API field to accept dual stack addresses. one IP address or two IP addresses from different IP families, in any order. This enables changing these various components to say “hey, this Pod is dual stack”.

Then, the Service API was extended in the same fashion to allow the users to express the dual stack-ness of a Service object. They can select any single IP or IP family, dual stack IP or families in any order, or just say “make it dual stack if available”. The api-server reserves or allocates IPs to Service objects based on the user needs and in the order the user expects.

As part of the next chapter, we look at the controllers responsible for finding endpoints (pods) that match the Service specifications. Namely Endpoint Controller and the then newly introduced higher performance Endpoint-Slice controller. These components were taught to understand IP Family specification on the Service and accordingly find corresponding ones in pods.

Finally, we look at the kube-proxy which runs on every node and is responsible for translating Service IPs to load balanced pod IPs. These had to be converted to dual stack as well. For example, if the user has configured kube-proxy to use iptables on linux machines then kube-proxy will concurrently use iptables for IPv4 and iptables IPv6 to ensure that rules match the specification set by the user.

Hockin: IPv6, while not exactly “new”, is still not the global default for networking. As the world inches closer to IPv6 being truly ubiquitous, the notion of “dual-stack” networking serves as a bridge. This model gives every workload access to both an older IPv4 address and a newer IPv6 address. Sometimes it will use IPv6 and sometimes IPv4 - for example, many websites still do not support IPv6.

Kubernetes was built with an unfortunately pervasive idea that things like Pods and Services could have a single IP address. Part of the hard work in this effort was finding all those places and updating them to support multiple addresses in a safe and compatible way. Being a network-heavy system, there were a LOT of places and tons of subtle edge-cases to handle.

As an example: Kubernetes Services historically have a single IP address. To add dual-stack we had to consider whether it was better to extend that to two addresses or to tell users to create two different Services (one for each flavor of IP). There were pros and cons to each approach.

InfoQ: The Kubernetes Enhancement Proposal - KEP-563 seems to touch all Kubernetes components. Can you talk about the proposal and the challenges associated with such a massive change?

Henidak: It is tempting to answer this from the perspective of pure build, test, and release scenarios. But I want to shed light on a couple of slightly different things. A subtle challenge appears when you realize that such a change does not happen in a vacuum. As we were progressing with applying the change, other parts in the system were progressing and changing as well. For example, while we were working on changing Service APIs, the Endpoint Slice controller was moving along which means that we have to align and modify in-flight code for other parts of Kubernetes. That requires coordination across multiple in- flight PRs and maintainers. Thankfully the maintainers were working closely together to ensure coordination and correctness of various features.

The other challenge that I think that I will remember for a long time to come is the idea of “net new APIs approach”. Kubernetes maintains strong rigor around APIs and APIs guidelines. These are what ensure consistency and forward/backward compatibility across various releases and ultimately provide an improved and consistent overall UX for our users, be it those who use CLI or program against the APIs. As we approached the API change, we needed to plural-ize a field. For example, Pod IP was a single value field and we needed it to become a multi value field (to express that this Pod now can have IPv4 and IPv6 IPs) while maintaining backward compatibility to older clusters and more importantly in clusters that are skewed (i.e., some parts of it are dual stack and other parts are not). That was an uncharted territory in Kubernetes APIs design. We had to test various approaches and update the guidelines accordingly. This challenge becomes more interesting when you consider that Services APIs did not just introduce field plural-ization but also interdependent fields where any field can drive the value of the rest of the dependent fields. This is another uncharted area we had to trailblaze.

Hockin: This KEP was one of the largest and most widely impactful in the project’s history. We were reluctant to just jump in and start hacking, so we knew we had to do as much up-front as we could. It started relatively simple, but as we discussed all the places that deal with addresses, it got more and more complicated. It took a long time and a lot of people to build up the list of things that needed updating.

It was so big that we had to break it up into phases or we would never be able to digest it. Once we felt like we understood it as well as we could, the development work proved that we had still missed a lot. We iterated on the details until we felt we had something we could call “alpha”.

Once we put that in people’s hands, we realized we had still missed the mark on a lot of the details and iterated some more. I am glad we were willing to do that and take the extra time - the result is much better, in my opinion.

InfoQ: Does the dual stack create any likely unforeseen backward compatibility issues with the traditional ipv4 apps that developers and architects should be aware of? What would be the likely impediments on adoption?

Henidak: Compatibility is tough. Especially when you have practically endless variants of Kubernetes configurations, making it impossible to give a blanket statement about expected compatibility issues.

I like to think about that from both a proactive and reactive standpoint, with the responsibility split between the users and the maintainers community. Proactively, I think we have done a good job designing, implementation and testing. Especially testing. I remember the words “test until fear transforms into boredom” have been frequently said all throughout the journey. Prior to stable graduation we have been closely watching dual stack finding its way throughout the ecosystem by clouds, Kubernetes Kind, kubeadm and others to see if there are any problems there. We also deliberately remained in Beta stage even when it was then ready for Stable graduation for a whole release cycle to give every one more time to test and verify. Reactively, we are closely watching reported issues and problems, and engaging frequently with those deploying dual stack. All this is done with the purpose of easing the change and dealing with any possible problems, not just compatibility problems.

I have a high degree of confidence in our API compatibility work. Upgrading existing clusters to dual stack should be a seamless process. What worries me is “going back”. When you upgrade the cluster to dual stack you are effectively adding more address space for Service API to allocate from (i.e., adding IPv6 CIDR to existing Service IPv4 CIDR). This can also optionally include pod CIDR, depending on how the cluster networking was configured. If the user decided to go back to single stack they must deal with it as a “Pod and Service CIDR change” which is already a complex change to apply on the cluster and typically require manual intervention from cluster users.

Hockin: If we did our jobs right, it should not cause any backwards-compat problems. Everything that used to work should continue to work. Even if pods are assigned dual-stack addresses, the Services API defaults to single-stack unless you specifically ask for dual-stack.

If there are problems, of course, we want to hear about them.

InfoQ: Can you talk about the dual stack implementation’s cascading effect on the different public cloud Kubernetes offerings and the Container Network Interface(CNI)?

Henidak: Public clouds were already in the process of dual stack enablement for traditional IaaS workloads as we were working to enable dual stack on Kubernetes. I believe we should expect to see dual stack clusters first to show up as DIY clusters then managed clusters’ offerings, especially now that Kubernetes v1.23 has been released. The other front I am closely watching is non-traditional form factor clusters’ dual stack usage. For example where some or majority of the cluster nodes are deployed at edge locations and the control plane is cloud hosted. These use cases have been among the most demanding dual stack features due to the limited public IPv4 address space.

Hockin: Just because Kubernetes supports it doesn’t mean that all providers and environments support it. If you are using a cloud environment and you want dual-stack, you’ll need to be sure that your cloud actually supports it at the network level. If you are using a hosted Kubernetes product, you’ll need to make sure your provider supports dual-stack.

This being the first non-beta release, I expect it will take some time for most providers to qualify dual-stack support before they can offer it as a supported product.

InfoQ: Will you be able to intermix ipv4 and ipv6 addresses going forward? In general, what’s the roadmap for dual stack and the associated ecosystem?

Henidak: Yes. One of the things we have been keenly adamant about during the design, implementation and testing of Kubernetes implementation of dual stack is exactly that. Not just the mixing of addresses, but also ensuring Kubernetes makes no assumptions about IP family ordering. For example, some users will prefer a default IPv6 cluster but with the ability to support dual stack where needed. Others may have other preferences. Even Services API supports “use dual stack if available” construct. This enables developers who are publishing Kubernetes operators, charts and other types of specifications to express their application networking requirements anyway they see fit.

One of the things I really admire within our community is referring to things as “a roadmap”. It is coming from our common understanding that we are on a never ending journey. There is always more to do. We typically have an idea about mid and long term goals but we are also flexible in our planning and execution to respond to ever changing requirements or priorities. While dual stack enablement is a major milestone it is just a milestone in our never ending journey. In the near term I expect a lot more CNI providers and ingress controllers to support dual stack. I also expect that support to be extended to clouds, load balancers and bare metal deployment.

In the long term I see us focusing more and more on multi networking and multi homed Kubernetes pods. Where a single pod may have multiple nics attached to it from multiple networks and a single pod may report multiple addresses that may be from one or more ip families. This unlocks a major use case for the likes of Network Virtual Appliances (NVA) providers, and also works well in multi cluster or multi network environments.

Hockin: To be clear, this work isn’t really about making it possible to use an IPv4 client with an IPv6 server (or vice-versa). What this allows you to express is that a given workload can use either IP family to access servers, depending on what is needed. It also lets Kubernetes users express their needs more completely. For example, if you have a dual-stack cluster but want your Service to offer just one IP family, you can express that and you can choose exactly which IP family to use. The rest of the system will respect that choice.

This opens the door to some classes of app, specifically network-oriented things, that were hard to handle before, but ultimately this is pretty deep infrastructure. Hopefully most users won’t notice it or care about it, but those who really need it will find it available and ready to go.

The SIG-NETWORK community has more detailed information on networking for Kubernetes in general including the history of the dual-stack implementation.

Rate this Article

Adoption
Style

BT