Is Kubernetes the Right Tool for You?

I run Kubernetes for a living. I think you should probably not use it.

That’s not a contradiction. Kubernetes is the right answer for a specific class of problem - multi-tenant, polyglot, heterogeneous workloads at meaningful scale. For most teams I meet, it’s a tax: a complex distributed system bolted onto a stack that didn’t need one, paid for in hiring, security, and “why is my Pod CrashLoopBackOff” Slack threads at 11 p.m.

Here’s how to decide honestly.

The case for Kubernetes

When the fit is right, Kubernetes is genuinely brilliant. The strongest signals:

  • You run more than ~15 services, written in different languages, with different scaling characteristics. The orchestration burden becomes real, and the alternatives don’t compose well.
  • You need multi-tenancy - isolated workloads sharing a substrate, with quotas, RBAC, and network policies.
  • You’re cloud-portable on purpose - you genuinely need to move between clouds, run on-prem, or both. (Most companies say this; few actually do it.)
  • You have specialized hardware needs - GPUs, large memory machines, custom node pools - and you need scheduling intelligence beyond what a basic VM fleet gives you.
  • You have, or will have, a platform team. Kubernetes is a substrate to build a platform on top of. Without the human investment, it’s exposed plumbing.

If three or more of those describe you, Kubernetes is probably the right call. The complexity buys something real.

The case against

The hidden costs are real and rarely show up on day one:

  • Operational complexity. Networking (CNI, ingress, egress, service mesh), storage (CSI, dynamic provisioning), security (RBAC, PodSecurity, image policy), and a control plane that has its own maintenance and upgrade story. Every layer is its own learning curve.
  • Day-2 burden. Upgrades. Certificate rotation. CVE response. Node lifecycle. Etcd backups. These don’t go away with managed Kubernetes - they get smaller, but they don’t disappear.
  • Debugging difficulty. “Why is this slow?” becomes a five-layer question: pod, node, network, control plane, dependency. Junior engineers struggle. Senior engineers Google.
  • Cost. Both the cluster overhead (control plane, system pods, idle headroom) and the salary cost of the people who can run it competently.
  • Security surface. Default-allow networking, container escapes, supply-chain exposure. Doable, but it’s work - and ignoring it is malpractice.

A practical decision tree

Walk through these in order. The first “no” usually decides it.

  1. Do you have more than one stateless service worth deploying? → If no, run it on a managed PaaS (Cloud Run, App Runner, App Engine, Fly.io, Render). Stop reading.

  2. Will you have a dedicated platform/infra person within 6 months? → If no, run it on a managed PaaS or container service (Cloud Run, ECS Fargate, Container Apps). Move on.

  3. Are you sure the workload doesn’t fit a managed container service? → Cloud Run, ECS Fargate, and Azure Container Apps can run most stateless microservices. They handle scaling, networking, and updates. No clusters. Try them first.

  4. Do you genuinely need workload-level scheduling, not just per-service autoscaling? → GPUs, mixed instance types, batch jobs alongside HTTP services, complex co-scheduling. If yes, Kubernetes earns its keep. If no, keep going.

  5. Do you have multi-tenancy, compliance, or cross-cloud portability requirements that the managed services can’t meet? → If yes, Kubernetes. If no, you probably don’t need it.

If you reach step 5 and the answer to all of them was “kind of, eventually” - that’s a no. Build for the system you have, not the one you’re imagining.

The alternatives that get unfairly dismissed

  • Cloud Run / App Runner / Container Apps. Container, request port, deploy. Autoscales to zero. Handles TLS, networking, and rolling updates. Costs cents at low traffic. For most stateless services, this is the right answer and engineers reject it because it feels too easy.
  • ECS Fargate / Cloud Run for jobs. Containers without nodes. You give them an image and a CPU/memory shape; they run. No cluster.
  • A handful of VMs with systemd. Genuinely. If you have three services and a database, a few VMs with systemd units, a load balancer, and a Terraform module is operationally simpler, cheaper, and faster to debug than any cluster. You can revisit in a year.
  • Nomad. Less popular, much simpler than Kubernetes. If you want orchestration without the kitchen sink, look at it before defaulting to K8s.

When you’re already on Kubernetes and shouldn’t be

This is more common than the inverse. Signs your cluster is a poor fit:

  • Three engineers, one cluster, and YAML repos with more lines than your application code.
  • A monthly bill where the cluster overhead is larger than the workload it’s running.
  • An on-call rotation where most pages are infrastructure issues, not service issues.
  • Nobody on the team can confidently answer “what’s our upgrade path?”

The honest move is to migrate stateless services to Cloud Run or Fargate, keep one shared cluster only if you have genuinely stateful workloads, and free up the headcount to work on your actual product. Yes, this is uncomfortable. The sunk cost has been paid either way.

So when should you use it?

Use Kubernetes when:

  • The orchestration problem you have is bigger than the orchestration problem Kubernetes brings with it.
  • You have, or are willing to fund, the people to run it.
  • You’ve genuinely evaluated the cheaper alternatives and they don’t fit.

That’s a smaller set of companies than the conference circuit would have you believe. If you’re not in it, that’s good news - you have a much simpler set of choices than most people pretend.

Need a sanity check on your stack? Let’s talk.