Kubernetes Infrastructure At Medium

Eduardo

Published in

Medium Engineering

5 min readFeb 14, 2023

How we use Kubernetes to manage micro-services — a high-level view & Introduction

Why Kubernetes?

The simple answer is that it meets our needs quite nicely; and that it solves important, complex problems without us having to build the solutions ourselves. The obvious solutions Kubernetes provides are around scaling, bin-packing, and the fact that it allows services to be more or less ‘self-healing’.

Another critical consideration is deployments — ease of rollouts and rollbacks. We have built complex infrastructure around deployments, but more on that in another post to come.

How Are We Using Kubernetes?

Our production infrastructure is spread across four availability zones, in four unique Kubernetes clusters. Technically, Kubernetes now has mechanisms for managing topology like this within a single cluster entity, but it’s newer functionality that we haven’t explored yet.

Over time, we’ve realized that there are some massive benefits to keeping things spread across four clusters. The list continues to grow, but some of the more important ones are:

The ability to shift traffic across AZ’s when needed via some in-house tooling

This has proven extremely useful in cases where a single zone has a problem (whether due to the cloud provider, or some other broken dependency)

Gradual rollout of infrastructure changes across production

Say we want to test a new Kubernetes add-on or configuration change — we can always shift the majority of our production traffic to the other three clusters while we validate changes on the underlying infrastructure (and that’s only if we can’t validate already on our staging clusters)

Our service mesh of choice is Istio. We manage our ingress & egress gateways with a variety of in-house controllers that ensure smooth configuration & reconciliation for flows from our CDN to all four clusters. We won’t go into more detail here (that could cover a whole post on its own!).

Configuration & Management

Terraform and some in-house tooling are our weapons of choice for managing the configurations of our clusters. When the team was first conceptualizing Kubernetes configurations, there weren’t many tools out there to help streamline Terraform. We wrote (and continue to maintain) an in-house app that helps us templatize, render, and apply our configurations across each cluster (whether that’s our production clusters, or any of our internal staging ones).

Having a single tool that allows us to work with templates and static configurations has proven invaluable in ensuring we always have a ‘source of truth’ in our configurations, and a proper process for testing & applying changes to our clusters.

We all know how rapidly the Kubernetes & containerization landscape evolves — let us know in the responses what other tooling you use to make your Kubernetes configuration easier to manage!

Tuning for Scaling — Expanding for Bursts, Contracting with Requests

A large amount of effort has gone into making sure our application resource requests are right-sized based on their true utilization. That helped a lot in terms of getting Medium to a place where we’re really making the most of our nodes (much more effective bin-packing). This also had the benefit of smoothing-out some of our scaling, but it took some additional tuning and tools to get us there.

The Cluster Over-Provisioner & Pod Preemption

This tool is great. The overly-simplified explanation for what it does is that you define a number of replicas, and the amount of resources they need. In our case, we know that the service (we’ll just call it backend-A) that needs to scale the most with traffic, happens to also need a good number of resources. Once we understood the shape of our scaling events, we knew how many replicas to plan for and how to size them.

Let’s say we have frequent bursts, and that this service needs around 200 additional pods (across all four clusters) to start absorbing requests. If those don’t scale quickly, we start seeing a sharp increase in 5xx errors.

We set up the cluster-overprovisioner in each cluster to request a slightly higher amount of CPU & memory than the backend-A pods, and set them to a replica count of 50 (since this is a per-cluster configuration). With Priority Preemption and the cluster-autoscaler properly configured, we gained the following benefits:

cluster-overprovisioner aims to have 200 additional backend-A pods-worth of resources available at any given time for a scale-up event
When new backend-A pods need to get scheduled, cluster-overprovisioner pods will get preempted (aka evicted) in their favor
As overprovisioner pods get evicted, they still need to get re-scheduled. So they trigger a node scale-up event via the cluster-autoscaler

So the cluster-overprovisioner essentially absorbs the delay in node scale-up events, and gives us the room to absorb scale events for production services smoothly and without disruption.

An added benefit was that our node count graphs look much smoother than they used to. We’re not needing to scale nodes quite as much:

Total Nodes (across all 4 clusters) regularly bursting over 800–900 nodes prior to overprovisioning & right-sizing

Peaks in node counts drop to closer to 400 nodes across all production clusters, with peaks barely breaking 600 nodes — after over-provisioning + application right-sizing

Closing Notes

Kubernetes has a ton of complexity, and has an unlimited number of possible configurations based on an organization’s needs. At Medium, we have a lot of pride around how well we’ve been able to shape Kubernetes to our own needs. That doesn’t take away from how excited we are to explore new ways to enhance our infrastructure, while making use of new technologies that help us improve reliability and scalability along the way.