Kubernetes Cost Optimisation: Lessons from Running 500+ Pods in Production

Running Kubernetes in production at scale is expensive. We know this because we've spent the last three years managing clusters for businesses ranging from early-stage startups to enterprise financial services companies across Europe and North America. Across these engagements, we've consistently found that organisations are overspending on Kubernetes by 30–50%.

The good news? Most of the savings come from a handful of well-understood optimisations. Here are the lessons we've learned from running 500+ pods in production.

1. Right-size your resource requests (seriously)

This is the single most impactful optimisation, and it's the one most teams get wrong. The default behaviour is predictable: an engineer sets CPU and memory requests during initial deployment and never revisits them. Requests are typically set 2–5x higher than actual utilisation because nobody wants to be the person whose service got OOMKilled in production.

The result? Massive over-provisioning. We routinely see clusters where actual CPU utilisation is 15–20% of requested CPU. That means 80% of your compute spend is wasted headroom.

What to do:

Install a metrics stack (Prometheus + Grafana, or Datadog) if you haven't already
Use tools like Kubecost or the Kubernetes Vertical Pod Autoscaler (VPA) in recommendation mode to analyse actual resource usage over 7–14 days
Right-size requests to P95 usage + 20% buffer
Set memory limits equal to requests (to avoid OOM surprises) but set CPU limits generously or remove them entirely (CPU throttling is often worse than over-provisioning)
Review and adjust quarterly

Typical saving: 25–40% of compute costs

2. Use spot/preemptible instances for non-critical workloads

If you're running all your Kubernetes nodes on on-demand instances, you're leaving significant savings on the table. AWS Spot Instances and Azure Spot VMs offer 60–90% discounts compared to on-demand pricing.

Not every workload is suitable for spot instances — you need to handle interruptions gracefully. But many workloads are perfect candidates:

CI/CD build agents
Batch processing jobs
Development and staging environments
Stateless microservices with multiple replicas
Data processing pipelines

Implementation approach:

Create separate node pools for spot and on-demand instances
Use node affinity and tolerations to schedule appropriate workloads on spot nodes
Ensure all spot-eligible services have multiple replicas across availability zones
Implement pod disruption budgets to manage graceful node termination
Use a tool like Karpenter (AWS) for intelligent spot instance selection

Typical saving: 40–70% on eligible workload compute costs

3. Implement cluster autoscaling properly

Cluster autoscaler (or Karpenter on AWS) should be your best friend. But we frequently see misconfigured autoscaling that either scales too slowly (causing performance issues) or doesn't scale down aggressively enough (wasting money).

Key configuration tips:

Set scale-down-delay-after-add to 5–10 minutes (default is often too conservative)
Configure scale-down-utilisation-threshold appropriately (0.5 is a good starting point)
Use pod disruption budgets to allow safe scale-down
Consider separate node groups for different workload profiles
Enable scale-down-unready-time for nodes that fail health checks

4. Review your persistent volume usage

Persistent volumes are a silent cost driver. We've seen clients with hundreds of gigabytes of provisioned EBS/Azure Disk storage that's 80% empty, because volumes were sized for peak estimates that never materialised.

What to do:

Audit all PersistentVolumeClaims and compare provisioned vs actual usage
Resize volumes down where possible (EBS supports online resize, but only upward — for downsizing you'll need to migrate data)
Use appropriate storage classes (gp3 vs gp2 on AWS — gp3 is 20% cheaper with better performance)
Delete orphaned volumes from terminated pods
Consider using EFS/Azure Files for shared storage needs instead of over-provisioned individual volumes

5. Optimise your container images

Large container images don't just slow down deployments — they cost money through increased data transfer and storage. We've seen production images exceeding 2GB that could be reduced to under 100MB.

Quick wins:

Use multi-stage builds with minimal base images (distroless, Alpine, or scratch)
Avoid installing unnecessary packages and build tools in production images
Use .dockerignore to exclude test files, documentation, and local configs
Implement image layer caching in your CI/CD pipeline
Set up automated image cleanup policies in your container registry

6. Schedule non-production environments

Development, staging, and QA environments don't need to run 24/7. If your engineering team works 09:00–18:00 Monday to Friday, that's only 45 hours out of 168 in a week — meaning 73% of non-production compute costs are wasted on empty environments.

Implementation:

Use a tool like Kube-downscaler to automatically scale non-production namespaces to zero outside business hours
Implement on-demand environment creation for feature branches (spin up on PR open, tear down on merge)
Consider serverless alternatives (Fargate, Azure Container Apps) for dev/test workloads

Typical saving: 60–70% on non-production environment costs

The compound effect

None of these optimisations is revolutionary on its own. But applied together, the savings compound dramatically. A typical optimisation engagement across these six areas delivers:

25–40% reduction in production compute costs (right-sizing + spot instances)
60–70% reduction in non-production costs (scheduling + right-sizing)
20–30% reduction in storage costs (PV audit + storage class optimisation)
Overall: 30–50% reduction in total Kubernetes-related cloud spend

For a company spending $50,000/month on Kubernetes infrastructure, that's $15,000–$25,000 in monthly savings — $180,000–$300,000 per year.

Getting started

If you're running Kubernetes and haven't done a cost optimisation review in the last six months, you're almost certainly overspending. Start with visibility — install Kubecost or review your cloud provider's cost allocation tools — and prioritise right-sizing as your first initiative.

Or, if you'd rather have experienced engineers handle it, get in touch. Our FinOps team has optimised Kubernetes clusters for dozens of businesses and can typically identify significant savings within the first week.