Microsoft Expands Azure Kubernetes Service with Bare Metal, Fleet Management and AI Infrastructure

At this year's Microsoft Build 2026, Microsoft unveiled a broad set of enhancements to Azure Kubernetes Service (AKS) aimed at making Kubernetes a first-class platform for AI training, inference, and large-scale cloud-native applications. The announcements span infrastructure, multi-cluster management, AI orchestration, and model serving, underscoring Microsoft's view that the future of AI will increasingly run on Kubernetes rather than bespoke AI infrastructure stacks.

Among the most notable updates are AKS on Bare Metal, which gives workloads direct access to hardware without a hypervisor; Azure Kubernetes Fleet Manager for Arc-enabled clusters, extending centralized management across cloud and on-premises environments; Anyscale on Azure, a managed Ray service for distributed AI workloads; and improvements to AI model deployment through AI Runway and the Kubernetes AI Toolchain Operator (KAITO). Together, the announcements signal Microsoft's ambition to make Kubernetes the operational backbone for enterprise AI at scale.

Microsoft's first focus area is simplifying cluster operations. Two features announced as generally available are Managed System Node Pools in AKS Automatic and Azure Container Linux, a lightweight operating system optimized for containers.

Managed System Node Pools separate core Kubernetes components from application workloads, allowing Azure to handle capacity management, patching, and scaling automatically. This is particularly valuable for GPU-heavy AI workloads, where system services competing for resources can affect performance and predictability. Meanwhile, Azure Container Linux offers a minimal, Microsoft-maintained operating system designed to reduce configuration drift and simplify maintenance across large Kubernetes fleets.

The approach reflects a broader trend among cloud providers to abstract away the operational complexity of Kubernetes itself, allowing teams to focus more on applications and AI models rather than cluster administration.

Perhaps the most technically significant announcement is AKS on Bare Metal, currently in public preview. By removing the virtualization layer, AKS can now provide direct access to technologies such as NVLink, RDMA, and high-performance networking, capabilities that are increasingly important for large language model training and latency-sensitive inference workloads.

Microsoft argues that while virtualization offers flexibility, some AI workloads incur measurable performance penalties from additional abstraction layers. Bare-metal AKS aims to provide the best of both worlds: the operational consistency of Kubernetes and the raw performance of dedicated hardware. This is particularly relevant as enterprises train larger AI models and deploy increasingly demanding inference workloads where even small efficiency gains can translate into significant cost savings.

The company also announced the general availability of Azure Kubernetes Fleet Manager for Arc-enabled clusters, extending fleet-wide management beyond Azure to include hybrid and multi-cloud environments.

Rather than treating Kubernetes clusters as isolated systems, Fleet Manager enables centralized policy enforcement, workload placement, staged rollouts, and RBAC governance across entire fleets of clusters. This capability becomes increasingly important as enterprises deploy AI applications across multiple regions, cloud providers, and on-premises environments while seeking consistent operational practices and governance controls.

This emphasis on fleet management mirrors a growing realization within the industry that Kubernetes maturity is less about operating individual clusters and more about managing entire estates as unified platforms. Microsoft has increasingly positioned AKS around this philosophy in its broader open-source and Kubernetes strategy.

Beyond the Kubernetes infrastructure itself, Microsoft announced several AI-focused capabilities intended to simplify model training and inference.

Anyscale on Azure, now in public preview, brings managed Ray to AKS, allowing organizations to orchestrate distributed AI workloads using CPUs and GPUs across dynamically scaling clusters. The service integrates directly into Azure subscriptions and governance models, enabling enterprises to train and deploy large AI models without managing the complexity of Ray clusters independently.

Microsoft also highlighted AI Runway, a Kubernetes-native model deployment framework first introduced earlier in 2026. AI Runway enables users to select models, validate GPU requirements, estimate deployment costs, and launch production endpoints through Kubernetes-native abstractions. Under the hood, KAITO provisions resources, launches optimized runtimes such as vLLM, and integrates with Kubernetes autoscaling and networking technologies like KEDA and Gateway API.

The result is a model-serving platform that seeks to simplify AI deployment without obscuring the underlying Kubernetes primitives that platform engineers rely on for control and observability.

Microsoft's announcements come amid intensifying competition among cloud providers seeking to become the preferred platform for AI infrastructure. AWS continues to expand its Kubernetes and AI services through EKS and Bedrock, while Google Cloud is investing heavily in GKE and AI-native infrastructure. Meanwhile, open-source ecosystems centered around Ray, vLLM, KubeRay, and Gateway API continue to mature rapidly.

What differentiates Microsoft's approach is its attempt to unify these components into a cohesive platform. Rather than building entirely proprietary AI infrastructure, Microsoft is leaning heavily on open-source technologies such as Kubernetes, Ray, Gateway API, and cloud-native networking, while wrapping them with managed services, governance capabilities, and enterprise integrations.

This strategy aligns with a growing industry belief that AI infrastructure will evolve similarly to cloud-native computing itself: open standards and shared operational patterns will become more important than proprietary orchestration systems as AI moves from experimentation into mainstream production environments.

The broader message from Microsoft's Build announcements is that the question of whether AI belongs on Kubernetes has largely been settled. The challenge has shifted toward operating AI workloads reliably while balancing cost, performance, and scalability.

About the Author

Craig Risi

Show moreShow less

InfoQ Software Architects' Newsletter

Write for InfoQ

About the Author

Craig Risi

Rate this Article

This content is in the DevOps topic

Related Topics:

Related Editorial

Related Sponsors

Popular across InfoQ

The InfoQ Newsletter