Why Kubernetes failed

The world needs only five computers – Someone

A colleague asked me why Kubernetes failed to achieve its goal to be an organization-spanning cluster scheduler. In other words, why isn’t it possible for a company to dump all of its computational resources into a massive pool for Kubernetes to manage? For that matter, why isn’t it possible for a cloud provider to run a single Kubernetes cluster, and rent it out to many individual companies?

In researching this post, I actually wasn’t able to find evidence that Kubernetes ever claimed to be a warehouse-scale platform. Certainly in practice today, it’s common to see small Kubernetes clusters dedicated to single teams, rather than massive org-spanning macroclusters. But that won’t stop many people (myself included) from wishing that Kubernetes would provide an abstraction for the management of unbounded quantities/complexities of computation.

I think the “failure” of Kubernetes to realize this dream is ascribable to three reasons:

Technical scalability bottlenecks
Governance scalability bottlenecks
Upgradeability

Technical scalability bottlenecks

Various components within Kubernetes do not scale; chief among them are the apiserver and the backing etcd datastore. If Kubernetes is ever going to be used for “warehouse scale” computing, all of the internal components would need to be sharded in a way that eliminates any possibility of overload. For instance, it should never be possible to build an etcd watch over an entire cluster’s objects; instead you’d watch a single logical partition within a cluster.

Governance scalability bottlenecks

Kubernetes has a single layer of governance, that being the namespace. Many object kinds (e.g. CustomResourceDefinition) can only be defined at the cluster level. If Kubernetes were ever going to scale to meet everyone’s needs, it would need to satisfy the “always a bigger fish” principle: objects group into namespaces, namespaces group into virtual clusters, and virtual clusters group into bigger virtual clusters. If you follow the Matryoshka dolls far enough, eventually you’ll find the outermost one. Or will you?

Upgradeability

It’s far too scary to upgrade a single organization-wide Kubernetes cluster. Today, if a Kubernetes cluster admin said “let’s just roll out the new binaries across the org and see what happens” any sane person would assume that the speaker is itching to bring their job to a swift and spectacular end.

If Kubernetes were ever going to span an entire organization, a single under-resourced cluster administrator would need to be able to deploy new binaries and then head out to lunch. Compatibility breakages would need to so few-and-far-between that most admins had never seen one. Very few software projects achieve this degree of forward-compatibility.